Commit graph

1204 commits

Author SHA1 Message Date
Venkatesh Pallipadi
58d9ce7d75 [PATCH] Revert nmi_known_cpu() check during boot option parsing
Commit f2802e7f57 and its x86 version
(b7471c6da9) adds nmi_known_cpu() check
while parsing boot options in x86_64 and i386.

With that, "nmi_watchdog=2" stops working for me on Intel Core 2 CPU
based system.

The problem is, setup_nmi_watchdog is called while parsing the boot
option and identify_cpu is not done yet.  So, the return value of
nmi_known_cpu() is not valid at this point.

So revert that check.  This should not have any adverse effect as the
nmi_known_cpu() check is done again later in enable_lapic_nmi_watchdog().

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-23 07:52:05 -08:00
Muli Ben-Yehuda
b92cc55923 [PATCH] x86-64: tighten up printks
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-01-11 01:52:44 +01:00
Jack Steiner
ed5316d445 [PATCH] x86-64: - Ignore long SMI interrupts in clock calibration
Ensure that no SMI interrupts occur between the read of the HPET & TSC
in the clock calibration loop.

I noticed that a 2.66GHz system incorrectly detected the processor
clock speed about 1/7 of the time:

	time.c: Detected 2660.005 MHz processor.	(most of the time)
	time.c: Detected 2988.203 MHz processor.	(sometime)

The problem is caused by an SMI interrupt occuring in hpet_calibrate_tsc()
between the read of the HPET & TSC. Prior to switching the BIOS into
ACPI mode, it appears that every 27msec an SMI interrupt occurs. The
SMI interrupt takes 4.8 msec to process.

Note: On my test system, TICK_MIN had to be >380. I picked 5000
to minimize risk of having a value that is too small for other
platforms.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>

 arch/x86_64/kernel/time.c |   25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)
2007-01-11 01:52:44 +01:00
Linus Torvalds
fea5f1e196 Revert "[PATCH] x86-64: Try multiple timer variants in check_timer"
This reverts commit b026872601, which has
been linked to several problem reports with IO-APIC and the timer.
Machines either don't boot because the timer doesn't happen, or we get
double timer interrupts because we end up double-routing the timer irq
through multiple interfaces.

See for example

	http://lkml.org/lkml/2006/12/16/101
	http://lkml.org/lkml/2007/1/3/9
	http://bugzilla.kernel.org/show_bug.cgi?id=7789

about some of the discussion.

Patches to fix this cleanup exist (and have been confirmed to work fine
at least for some of the affected cases) and we'll revisit it for
2.6.21, but this late in the -rc series we're better off just reverting
the incomplete commit that caused the problems.

Suggested-by: Adrian Bunk <bunk@stusta.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Yinghai Lu <yinghai.lu@amd.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-08 15:04:46 -08:00
Linus Torvalds
de9e957f12 Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] longhaul: Kill off warnings introduced by recent changes.
  [CPUFREQ] Uninitialized use of cmd.val in arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c:acpi_cpufreq_target()
  [CPUFREQ] Longhaul - Always guess FSB
  [CPUFREQ] Longhaul - Fix up powersaver assumptions.
  [CPUFREQ] longhaul: Fix up unreachable code.
  [CPUFREQ] speedstep-centrino: missing space and bracket
  [CPUFREQ] Bug fix for acpi-cpufreq and cpufreq_stats oops on frequency change notification
  [CPUFREQ] select consistently
2007-01-03 17:34:12 -08:00
OGAWA Hirofumi
7523c4dd99 [PATCH] x86_64: Fix dump_trace()
If caller passed the tsk, we should use it to validate a stack ptr.
Otherwise, sysrq-t and other debugging stuff doesn't work.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-03 08:49:59 -08:00
Randy Dunlap
917325d30a [CPUFREQ] select consistently
Make x86_64 ACPI_CPU_FREQ select CPU_FREQ_TABLE like other methods do.
(although we should still eliminate as much use of 'select' as possible)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2006-12-22 22:45:41 -05:00
Ingo Molnar
0888f06ac9 [PATCH] sched: fix bad missed wakeups in the i386, x86_64, ia64, ACPI and APM idle code
Fernando Lopez-Lezcano reported frequent scheduling latencies and audio
xruns starting at the 2.6.18-rt kernel, and those problems persisted all
until current -rt kernels. The latencies were serious and unjustified by
system load, often in the milliseconds range.

After a patient and heroic multi-month effort of Fernando, where he
tested dozens of kernels, tried various configs, boot options,
test-patches of mine and provided latency traces of those incidents, the
following 'smoking gun' trace was captured by him:

                 _------=> CPU#
                / _-----=> irqs-off
               | / _----=> need-resched
               || / _---=> hardirq/softirq
               ||| / _--=> preempt-depth
               |||| /
               |||||     delay
   cmd     pid ||||| time  |   caller
      \   /    |||||   \   |   /
  IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup (try_to_wake_up)
  IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup <<...>-5856> (37 0)
  IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup (c01262ba 0 0)
  IRQ_19-1479  1D..1    0us : resched_task (try_to_wake_up)
  IRQ_19-1479  1D..1    0us : __spin_unlock_irqrestore (try_to_wake_up)
  ...
  <idle>-0     1...1   11us!: default_idle (cpu_idle)
  ...
  <idle>-0     0Dn.1  602us : smp_apic_timer_interrupt (c0103baf 1 0)
  ...
   <...>-5856  0D..2  618us : __switch_to (__schedule)
   <...>-5856  0D..2  618us : __schedule <<idle>-0> (20 162)
   <...>-5856  0D..2  619us : __spin_unlock_irq (__schedule)
   <...>-5856  0...1  619us : trace_stop_sched_switched (__schedule)
   <...>-5856  0D..1  619us : trace_stop_sched_switched <<...>-5856> (37 0)

what is visible in this trace is that CPU#1 ran try_to_wake_up() for
PID:5856, it placed PID:5856 on CPU#0's runqueue and ran resched_task()
for CPU#0. But it decided to not send an IPI that no CPU - due to
TS_POLLING. But CPU#0 never woke up after its NEED_RESCHED bit was set,
and only rescheduled to PID:5856 upon the next lapic timer IRQ. The
result was a 600+ usecs latency and a missed wakeup!

the bug turned out to be an idle-wakeup bug introduced into the mainline
kernel this summer via an optimization in the x86_64 tree:

    commit 495ab9c045
    Author: Andi Kleen <ak@suse.de>
    Date:   Mon Jun 26 13:59:11 2006 +0200

    [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status

    During some profiling I noticed that default_idle causes a lot of
    memory traffic. I think that is caused by the atomic operations
    to clear/set the polling flag in thread_info. There is actually
    no reason to make this atomic - only the idle thread does it
    to itself, other CPUs only read it. So I moved it into ti->status.

the problem is this type of change:

        if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
-               clear_thread_flag(TIF_POLLING_NRFLAG);
+               current_thread_info()->status &= ~TS_POLLING;
                smp_mb__after_clear_bit();
                while (!need_resched()) {
                        local_irq_disable();

this changes clear_thread_flag() to an explicit clearing of TS_POLLING.
clear_thread_flag() is defined as:

        clear_bit(flag, &ti->flags);

and clear_bit() is a LOCK-ed atomic instruction on all x86 platforms:

  static inline void clear_bit(int nr, volatile unsigned long * addr)
  {
          __asm__ __volatile__( LOCK_PREFIX
                  "btrl %1,%0"

hence smp_mb__after_clear_bit() is defined as a simple compile barrier:

  #define smp_mb__after_clear_bit()       barrier()

but the explicit TS_POLLING clearing introduced by the patch:

+               current_thread_info()->status &= ~TS_POLLING;

is not an atomic op! So the clearing of the TS_POLLING bit is freely
reorderable with the reading of the NEED_RESCHED bit - and both now
reside in different memory addresses.

CPU idle wakeup very much depends on ordered memory ops, the clearing of
the TS_POLLING flag must always be done before we test need_resched()
and hit the idle instruction(s). [Symmetrically, the wakeup code needs
to set NEED_RESCHED before it tests the TS_POLLING flag, so memory
ordering is paramount.]

Fernando's dual-core Athlon64 system has a sufficiently advanced memory
ordering model so that it triggered this scenario very often.

( And it also turned out that the reason why these latencies never
  triggered on my testsystems is that i routinely use idle=poll, which
  was the only idle variant not affected by this bug. )

The fix is to change the smp_mb__after_clear_bit() to an smp_mb(), to
act as an absolute barrier between the TS_POLLING write and the
NEED_RESCHED read. This affects almost all idling methods (default,
ACPI, APM), on all 3 x86 architectures: i386, x86_64, ia64.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 08:55:51 -08:00
Ingo Molnar
136f1e7a8c [PATCH] x86_64: fix boot time hang in detect_calgary()
if CONFIG_CALGARY_IOMMU is built into the kernel via
CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT, or is enabled via the
iommu=calgary boot option, then the detect_calgary() function runs to
detect the presence of a Calgary IOMMU.

detect_calgary() first searches the BIOS EBDA area for a "rio_table_hdr"
BIOS table. It has this parsing algorithm for the EBDA:

	while (offset) {
		...
		/* The next offset is stored in the 1st word. 0 means no more */
 		offset = *((unsigned short *)(ptr + offset));
	}

got that? Lets repeat it slowly: we've got a BIOS-supplied data
structure, plus Linux kernel code that will only break out of an
infinite parsing loop once the BIOS gives a zero offset. Ok?

Translation: what an excellent opportunity for BIOS writers to lock up
the Linux boot process in an utterly hard to debug place! Indeed the
BIOS jumped on that opportunity on my box, which has the following EBDA
chaining layout:

  384, 65282, 65535, 65535, 65535, 65535, 65535, 65535 ...

see the pattern? So my, definitely non-Calgary system happily locks up
in detect_calgary()!

the patch below fixes the boot hang by trusting the BIOS-supplied data
structure a bit less: the parser always has to make forward progress,
and if it doesnt, we break out of the loop and i get the expected kernel
message:

  Calgary: Unable to locate Rio Grande Table in EBDA - bailing!

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-21 00:08:28 -08:00
Linus Torvalds
d1526e2cda Remove stack unwinder for now
It has caused more problems than it ever really solved, and is
apparently not getting cleaned up and fixed.  We can put it back when
it's stable and isn't likely to make warning or bug events worse.

In the meantime, enable frame pointers for more readable stack traces.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-15 08:47:51 -08:00
Dave Jones
c4366889dd Merge ../linus
Conflicts:

	drivers/cpufreq/cpufreq.c
2006-12-12 17:41:41 -05:00
Alexey Dobriyan
1f29bcd739 [PATCH] sysctl: remove unused "context" param
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00
Andi Kleen
1bac3b383a [PATCH] x86: Work around gcc 4.2 over aggressive optimizer
The new PDA code uses a dummy _proxy_pda variable to describe
memory references to the PDA. It is never referenced
in inline assembly, but exists as input/output arguments.
gcc 4.2 in some cases can CSE references to this which causes
unresolved symbols.  Define it to zero to avoid this.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-09 21:33:36 +01:00
Ravikiran G Thirumalai
92715e282b [PATCH] x86: Fix boot hang due to nmi watchdog init code
2.6.19  stopped booting (or booted based on build/config) on our x86_64
systems due to a bug introduced in 2.6.19.  check_nmi_watchdog schedules an
IPI on all cpus to  busy wait on a flag, but fails to set the busywait
flag if NMI functionality is disabled.  This causes the secondary cpus
to spin in an endless loop, causing the kernel bootup to hang.
Depending upon the build, the  busywait flag got overwritten (stack variable)
and caused  the kernel to bootup on certain builds.  Following patch fixes
the bug by setting the busywait flag before returning from check_nmi_watchdog.
I guess using a stack variable is not good here as the calling function could
potentially return while the busy wait loop is still spinning on the flag.

AK: I redid the patch significantly to be cleaner

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-09 21:33:35 +01:00
Jeremy Fitzhardinge
c31a0bf3e1 [PATCH] Generic BUG for x86-64
This makes x86-64 use the generic BUG machinery.

The main advantage in using the generic BUG machinery for x86-64 is that
the inlined overhead of BUG is just the ud2a instruction; the file+line
information are no longer inlined into the instruction stream.  This
reduces cache pollution.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Hugh Dickens <hugh@veritas.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:39 -08:00
Linus Torvalds
4522d58275 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (156 commits)
  [PATCH] x86-64: Export smp_call_function_single
  [PATCH] i386: Clean up smp_tune_scheduling()
  [PATCH] unwinder: move .eh_frame to RODATA
  [PATCH] unwinder: fully support linker generated .eh_frame_hdr section
  [PATCH] x86-64: don't use set_irq_regs()
  [PATCH] x86-64: check vector in setup_ioapic_dest to verify if need setup_IO_APIC_irq
  [PATCH] x86-64: Make ix86 default to HIGHMEM4G instead of NOHIGHMEM
  [PATCH] i386: replace kmalloc+memset with kzalloc
  [PATCH] x86-64: remove remaining pc98 code
  [PATCH] x86-64: remove unused variable
  [PATCH] x86-64: Fix constraints in atomic_add_return()
  [PATCH] x86-64: fix asm constraints in i386 atomic_add_return
  [PATCH] x86-64: Correct documentation for bzImage protocol v2.05
  [PATCH] x86-64: replace kmalloc+memset with kzalloc in MTRR code
  [PATCH] x86-64: Fix numaq build error
  [PATCH] x86-64: include/asm-x86_64/cpufeature.h isn't a userspace header
  [PATCH] unwinder: Add debugging output to the Dwarf2 unwinder
  [PATCH] x86-64: Clarify error message in GART code
  [PATCH] x86-64: Fix interrupt race in idle callback (3rd try)
  [PATCH] x86-64: Remove unwind stack pointer alignment forcing again
  ...

Fixed conflict in include/linux/uaccess.h manually

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:59:11 -08:00
Magnus Damm
85916f8166 [PATCH] Kexec / Kdump: Unify elf note code
The elf note saving code is currently duplicated over several
architectures.  This cleanup patch simply adds code to a common file and
then replaces the arch-specific code with calls to the newly added code.

The only drawback with this approach is that s390 doesn't fully support
kexec-on-panic which for that arch leads to introduction of unused code.

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:46 -08:00
Ingo Molnar
0231606785 [PATCH] hotplug CPU: clean up hotcpu_notifier() use
There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
generating compiler warnings of unused symbols, hence forcing people to add
#ifdefs.

the compiler can skip truly unused functions just fine:

    text    data     bss     dec     hex filename
 1624412  728710 3674856 6027978  5bfaca vmlinux.before
 1624412  728710 3674856 6027978  5bfaca vmlinux.after

[akpm@osdl.org: topology.c fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:39 -08:00
Andrew Morton
a38a44c1a9 [PATCH] smp_call_function_single() check that local interrupts are enabled
smp_call_function_single() can deadlock if the caller disabled local
interrupts (the target CPU could be spinning on call_lock).  Check for that.

Why on earth do these functions use spin_lock_bh()??

Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:39 -08:00
Masami Hiramatsu
b4c6c34a53 [PATCH] kprobes: enable booster on the preemptible kernel
When we are unregistering a kprobe-booster, we can't release its
instruction buffer immediately on the preemptive kernel, because some
processes might be preempted on the buffer.  The freeze_processes() and
thaw_processes() functions can clean most of processes up from the buffer.
There are still some non-frozen threads who have the PF_NOFREEZE flag.  If
those threads are sleeping (not preempted) at the known place outside the
buffer, we can ensure safety of freeing.

However, the processing of this check routine takes a long time.  So, this
patch introduces the garbage collection mechanism of insn_slot.  It also
introduces the "dirty" flag to free_insn_slot because of efficiency.

The "clean" instruction slots (dirty flag is cleared) are released
immediately.  But the "dirty" slots which are used by boosted kprobes, are
marked as garbages.  collect_garbage_slots() will be invoked to release
"dirty" slots if there are more than INSNS_PER_PAGE garbage slots or if
there are no unused slots.

Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "bibo,mao" <bibo.mao@intel.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Yumiko Sugita <yumiko.sugita.yf@hitachi.com>
Cc: Satoshi Oshima <soshima@redhat.com>
Cc: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:38 -08:00
Henry Nestler
19e5d9c0d2 [PATCH] initrd: remove unused false condition for initrd_start
After LOADER_TYPE && INITRD_START are true, the short if-condition
for INITRD_START can never be false.

Remove unused code from the else condition.

Signed-off-by: Henry Nestler <henry.ne@arcor.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:38 -08:00
Arnd Bergmann
f5738ceed4 [PATCH] remove kernel syscalls
The last thing we agreed on was to remove the macros entirely for 2.6.19,
on all architectures. Unfortunately, I think nobody actually _did_ that,
so they are still there.

[akpm@osdl.org: x86_64 fix]
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Schafer <gschafer@zip.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:37 -08:00
Andi Kleen
64a26a7312 [PATCH] x86-64: Export smp_call_function_single
smp_call_function() is exported, makes sense to export this one too.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:19 +01:00
Jan Beulich
b65780e123 [PATCH] unwinder: move .eh_frame to RODATA
The .eh_frame section contents is never written to, so it can as well
benefit from CONFIG_DEBUG_RODATA.

Diff-ed against firstfloor tree.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:19 +01:00
Yinghai Lu
ad892f5e0d [PATCH] x86-64: check vector in setup_ioapic_dest to verify if need setup_IO_APIC_irq
setup_IO_APIC_irqs could fail to get vector for some device when you have too
many devices, because at that time only boot cpu is online.  So check vector
for irq in setup_ioapic_dest and call setup_IO_APIC_irq to make sure IO-APIC
irq-routing table is initialized.

Also seperate setup_IO_APIC_irq from setup_IO_APIC_irqs.

Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:19 +01:00
Andi Kleen
3807fd46e9 [PATCH] x86-64: Clarify error message in GART code
- Remove "Disabling IOMMU" message because it confuses people
- Clarify that the GART IOMMU is refered to in other message

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:13 +01:00
Venkatesh Pallipadi
d331e739f5 [PATCH] x86-64: Fix interrupt race in idle callback (3rd try)
Idle callbacks has some races when enter_idle() sets isidle and subsequent
interrupts that can happen on that CPU, before CPU goes to idle. Due to this,
an IDLE_END can get called before IDLE_START. To avoid these races, disable
interrupts before enter_idle and make sure that all idle routines do not
enable interrupts before entering idle.

Note that poll_idle() still has a this race as it has to enable interrupts
before going to idle. But, all other idle routines have the race fixed.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:13 +01:00
Andi Kleen
a0429d0d7a [PATCH] x86-64: Remove unwind stack pointer alignment forcing again
This was added as a workaround for the fallback unwinder not supporting
unaligned stack pointers properly. But now it was fixed to do that,
so it's not needed anymore

Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:13 +01:00
Jan Beulich
359ad0d401 [PATCH] unwinder: more sanity checks in Dwarf2 unwinder
Tighten the requirements on both input to and output from the Dwarf2
unwinder.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:13 +01:00
Andi Kleen
446f713ba1 [PATCH] unwinder: always use unlocked module list access in unwinder fallback
We're already well protected against module unloads because module
unload uses stop_machine(). The only exception is NMIs, but other
users already risk lockless accesses here.

This avoids some hackery in lockdep and also a potential deadlock

This matches what i386 does.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Arjan van de Ven
f3d73707a1 [PATCH] x86-64: Mark rdtsc as sync only for netburst, not for core2
On the Core2 cpus, the rdtsc instruction is not serializing (as defined
in the architecture reference since rdtsc exists) and due to the deep
speculation of these cores, it's possible that you can observe time go
backwards between cores due to this speculation. Since the kernel
already deals with this with the SYNC_RDTSC flag, the solution is
simple, only assume that the instruction is serializing on family 15...

The price one pays for this is a slightly slower gettimeofday (by a
dozen or two cycles), but that increase is quite small to pay for a
really-going-forward tsc counter.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Muli Ben-Yehuda
e496a0da7f [PATCH] Calgary: remove unused variables
Spotted by d binderman <dcb314@hotmail.com>.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Andi Kleen
6167796569 [PATCH] x86-64: Synchronize RDTSC on single core AMD
There is no guarantee that two RDTSCs in a row are monotonic,
so don't assume it on single core AMD systems.
This will make gettimeofday slower again
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Yinghai Lu
73ad8355d7 [PATCH] x86-64: remove unused acpi_found_madt in mparse.
remove unused acpi_found_madt in mparse.c

Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Rafael J. Wysocki
d4c45718b3 [PATCH] x86-64: Fix kobject_init() WARN_ON on resume from disk
Make mce_remove_device() clean up the kobject in per_cpu(device_mce, cpu)
after it has been unregistered.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Yinghai Lu
8fb6e5f5db [PATCH] x86_64: interrupt array size should be aligned to NR_VECTORS
interrupt array is referred for idt vectors instead of NR_IRQS, so change size
to NR_VECTORS - FIRST_EXTERNAL_VECTOR. Also change to static.

Signed-off-by: Yinghai Lu <yinghai@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Yinghai Lu
3df0af0eb0 [PATCH] x86_64: clear_bss before set_intr_gate with early_idt_handler
idt_table is in the .bss section, so clear_bss need to called at first

Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Chuck Ebbert
0741f4d207 [PATCH] x86: add sysctl for kstack_depth_to_print
Add sysctl for kstack_depth_to_print. This lets users change
the amount of raw stack data printed in dump_stack() without
having to reboot.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:11 +01:00
David Rientjes
86bd58bf4c [PATCH] x86-64: Remove unused GET_APIC_VERSION call from clear_local_APIC
Remove unused GET_APIC_VERSION call from clear_local_APIC() and
__setup_APIC_LVTT().

Reported by D Binderman <dcb314@hotmail.com>.

Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: David Rientjes <rientjes@cs.washington.edu>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:11 +01:00
Karsten Wiese
f990fff427 [PATCH] x86: Regard MSRs in lapic_suspend()/lapic_resume()
Read/Write APIC_LVTPC and APIC_LVTTHMR only,
if get_maxlvt() returns certain values.
This is done like everywhere else in i386/kernel/apic.c,
so I guess its correct.
Suspends/Resumes to disk fine and eleminates an smp_error_interrupt()
here on a K8.

AK: ported to x86-64 too

Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:11 +01:00
Stephane Eranian
ee58fad51a [PATCH] x86-64: x86-64 add Intel BTS cpufeature bit and detection (take 2)
Here is a small patch for x86-64 which adds a cpufeature flag and
detection code for Intel's Branch Trace Store (BTS) feature. This
feature can be found on Intel P4 and Core 2 processors among others.
It can also be used by perfmon.

changelog:
	- add CPU_FEATURE_BTS
	- add Branch Trace Store detection

signed-off-by: stephane eranian <eranian@hpl.hp.com>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:11 +01:00
Siddha, Suresh B
b0d0a4ba45 [PATCH] x86: fix the irqbalance quirk for E7320/E7520/E7525
Move the irqbalance quirks for E7320/E7520/E7525(Errata 23 in
http://download.intel.com/design/chipsets/specupdt/30304203.pdf) to early
quirks.

And add a PCI quirk for these platforms to check(which happens very late
during the boot) if the APIC routing is indeed set to default flat mode.

This fixes the breakage(in x86_64) of this quirk due to cpu hotplug which
selects physical mode instead of the logical flat(as needed for this errata
workaround).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:10 +01:00
Siddha, Suresh B
9899f826fc [PATCH] x86-64: add genapic_force
Add genapic_force. Used by the next Intel quirks patch.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:10 +01:00
Andi Kleen
9a45732422 [PATCH] x86-64: Rate limit no irq handler messages
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:09 +01:00
Ernie Petrides
103efcd9aa [PATCH] x86-64: fix perms/range of vsyscall vma in /proc/*/maps
The final line of /proc/<pid>/maps on x86_64 for native 64-bit
tasks shows an incorrect ending address and incorrect permissions.  There
is only a single page mapped in this vsyscall region, and it is accessible
for both read and execute.

The patch below fixes this.  (Since 32-bit-compat tasks have a real vma
with correct perms/range, no change is necessary for that scenario.)

Before the patch, a "cat /proc/self/maps | tail -1" shows this:

        ffffffffff600000-ffffffffffe00000 ---p 00000000 [...]

After the patch, this is the output:

        ffffffffff600000-ffffffffff601000 r-xp 00000000 [...]

Signed-off-by: Ernie Petrides <petrides@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:09 +01:00
Andi Kleen
516d283643 [PATCH] x86-64: Fix race in IO-APIC routing entry setup.
Interrupt could happen between setting the IO-APIC entry
and setting its interrupt data.

Pointed out by Linus.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:07 +01:00
Muli Ben-Yehuda
bff6547bb6 [PATCH] Calgary: allow compiling Calgary in but not using it by default
This patch makes it possible to compile Calgary in but not use it by
default. In this mode, use 'iommu=calgary' to activate it.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:07 +01:00
Muli Ben-Yehuda
eae9375554 [PATCH] Calgary: check BBAR ioremap success when ioremapping
This patch cleans up the previous "Use BIOS supplied BBAR information"
patch. Mostly stylistic clenaups, but also check for ioremap failure
when we ioremap the BBAR rather than when trying to use it.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Laurent Vivier <Laurent.Vivier@bull.net>
2006-12-07 02:14:06 +01:00
Laurent Vivier
b34e90b8f0 [PATCH] Calgary: use BIOS supplied BBARs and topology information
Find the BBAR register address of each Calgary using the "Extended
BIOS Data Area" rather than calculating it ourselves. Also get the bus
topology (what PHB each bus is on) from Calgary rather than
calculating it ourselves.

This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=7407.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:06 +01:00
Muli Ben-Yehuda
58db854827 [PATCH] calgary: phb_shift can be int
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:06 +01:00
Andi Kleen
b026872601 [PATCH] x86-64: Try multiple timer variants in check_timer
Instead of adding all kinds of more quirks try various timer
routing variants in check_timer.

In particular this tries to handle quirks from:
- Nvidia NF2-4 reference BIOS: wrong timer override
- Asus: Wrong timer override but no HPET table
- ATI: require timer disabled in 8259
- Some boards: require timer enabled in 8259

We just try many of the the known variants in the hopefully right order
in check_timer.

Trying pin 0/2 on Nvidia suggested by Tim Hockin.

TBD Experimental. Needs a lot of testing

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:06 +01:00
Andi Kleen
ab2bf0c1c6 [PATCH] x86-64: Use probe_kernel_address in arch/x86_64/*
Instead of open coded __get_user

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:06 +01:00
Yinghai Lu
5df0287ecc [PATCH] x86-64: Extend clear_irq_vector
Clear the irq releated entries in irq_vector, irq_domain and vector_irq
instead of clearing irq_vector only. So when new irq is created, it
could reuse that vector. (actually is the second loop scanning from
FIRST_DEVICE_VECTOR+8). This could avoid the vectors are used up
with enough module inserting and removing

Cc: Eric W. Biedierman <ebiederm@xmission.com>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-By: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:05 +01:00
Andi Kleen
9c5f8be462 [PATCH] x86: Mention PCI instead of RAM in NMI parity error message
On modern systems RAM errors don't cause NMIs, but it's usually
caused by PCI SERR. Mention PCI instead of RAM in the printk.

Reported by r_hayashi@ctc-g.co.jp (Ryutaro Hayashi)

Cc:  r_hayashi@ctc-g.co.jp

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:03 +01:00
Alan Cox
7cd8b6861e [PATCH] x86: remove last two pci_find offenders in the core code
Resending as I believe the discussion about them established they were
correct.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:03 +01:00
Andi Kleen
72690a2118 [PATCH] x86: Don't use nested idle loops
Currently the idle loop has two nested loops -- one high level
in cpu_idle and in some low level idle functions another one.

Looping in the low level idle functions breaks the idle notifiers
because interrupts waking up sleep states need to execute
exit_idle() which is only in cpu_idle().

So don't do that, only loop in cpu_idle(). This only removes
code.

In some cases e.g. poll_idle the idle loop is a little longer
now because cpu_idle checks more things. I hope that isn't a problem
ACPI idle doesn't change behaviour because it never looped anyways.

Cc: len.brown@intel.com
Cc: eranian@hpl.hp.com
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:03 +01:00
Jan Beulich
bcddc0155f [PATCH] x86-64: miscellaneous entry.S adjustments
This patch:
- makes ret_from_sys_call no longer global (all external users were
  previously switched to use int_ret_from_sys_call)
- adjusts placement of a CFI_{REMEMBER,RESTORE}_STATE pair to better
  fit logic flow
- eliminates an unnecessary pair of CFI_{REMEMBER,RESTORE}_STATE
- glues together function- and unwinder-wise the previously separate
  system_call and int_ret_from_sys_call function fragments

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:02 +01:00
Andrew Morton
da68933e0a [PATCH] x86-64: dump_trace() atomicity fix
Fix

BUG: using smp_processor_id() in preemptible [00000001] code:

in backtracer on preemptible debug kernels.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:02 +01:00
Aaron Durbin
399287229c [PATCH] x86-64: Insert Local and IO APIC(s) into resource map
Insert the Local APIC and IO APIC(s) into the resource tree.  It allows the
APIC resources to be visible within /proc/iomem.  The patch also takes into
account IO APIC(s) mapped in the PCI space by deferring the insertion until
after PCI has allocated its necessary resources.

Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:01 +01:00
Andrew Morton
bb81a09e55 [PATCH] x86: all cpu backtrace
When a spinlock lockup occurs, arrange for the NMI code to emit an all-cpu
backtrace, so we get to see which CPU is holding the lock, and where.

Cc: Andi Kleen <ak@muc.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:01 +01:00
Alexey Dobriyan
e2764a1e30 [PATCH] x86-64: use BUILD_BUG_ON in FPU code
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:01 +01:00
Stephane Eranian
36b2a8d5af [PATCH] x86-64: add X86_FEATURE_PEBS and detection
Here is a patch (used by perfmon2) to detect the presence of the
Precise Event Based Sampling (PEBS) feature for Intel 64-bit processors.
The patch also adds the cpu_has_pebs macro.

changelog:
	- adds X86_FEATURE_PEBS
	- adds cpu_has_pebs to test for X86_FEATURE_PEBS

Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:01 +01:00
Andi Kleen
dd315df176 [PATCH] x86: Compress stack unwinder output
The unwinder has some extra newlines, which eat up loads of screen
space when it spews. (See https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=137900
for a nasty example).

warning_symbol-> and warning-> already printk a newline, so don't add one
in the strings passed to them.

[AK: redone for new code]

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:00 +01:00
Andi Kleen
b615ebdac9 [PATCH] x86: shorten lines in unwinder to be <= 80 characters
Andrew complained about > 80 character lines in the new unwinder.
Fix that.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:00 +01:00
David Howells
4c1ac1b491 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/infiniband/core/iwcm.c
	drivers/net/chelsio/cxgb2.c
	drivers/net/wireless/bcm43xx/bcm43xx_main.c
	drivers/net/wireless/prism54/islpci_eth.c
	drivers/usb/core/hub.h
	drivers/usb/input/hid-core.c
	net/core/netpoll.c

Fix up merge failures with Linus's head and fix new compilation failures.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 14:37:56 +00:00
Linus Torvalds
707badb80b Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6:
  [PATCH] x86-64: Use stricter in process stack check for unwinder
  [PATCH] i386: Fix compilation with UP genericarch
  [PATCH] x86-64: Fix warning in io_apic.c
  [PATCH] x86-64: work around gcc4 issue with -Os in Dwarf2 stack unwind
  [PATCH] x86_64: Align data segment to PAGE_SIZE boundary
2006-11-28 17:28:41 -08:00
Andi Kleen
c547c77ee4 [PATCH] x86-64: Use stricter in process stack check for unwinder
Previously it would check for alignment only, which could break
if the stack pointer was unaligned. Now explicitely check if the
stack pointer is in the stack page of the current process.

Ported from i386.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-28 20:12:59 +01:00
Andi Kleen
f7a23328a7 [PATCH] x86-64: Fix warning in io_apic.c 2006-11-28 20:12:59 +01:00
Ingo Molnar
24d7bb3396 [PATCH] x86_64: fix 'earlyprintk=...,keep' regression
Commit 2c8c0e6b8d ("[PATCH] Convert x86-64
to early param") broke the earlyprintk=...,keep feature.

This restores that functionality.  Tested on x86_64.  Must-have for
v2.6.19, no risk.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-28 10:58:21 -08:00
David Howells
65f27f3844 WorkStruct: Pass the work_struct pointer instead of context data
Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct.  This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function.  This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated..  This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems.  But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default.  Special
initiators exist for the non-auto-release case (ending in _NAR).


Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:55:48 +00:00
David Howells
52bad64d95 WorkStruct: Separate delayable and non-delayable events.
Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.

The work_struct struct is huge, and this limits it's usefulness.  On a 64-bit
architecture it's nearly 100 bytes in size.  This reduces that by half for the
non-delayable type of event.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:54:01 +00:00
Vivek Goyal
3af9815328 [PATCH] x86_64: Align data segment to PAGE_SIZE boundary
o Explicitly align data segment to PAGE_SIZE boundary otherwise depending on
  config options and tool chain it might be placed on a non PAGE_SIZE aligned
  boundary and vmlinux loaders like kexec fail when they encounter a
  PT_LOAD type segment which is not aligned to PAGE_SIZE boundary.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-21 10:31:21 +01:00
Vivek Goyal
9a14f2964b [PATCH] x86_64: Align data segment to PAGE_SIZE boundary
o Explicitly align data segment to PAGE_SIZE boundary otherwise depending on
  config options and tool chain it might be placed on a non PAGE_SIZE aligned
  boundary and vmlinux loaders like kexec fail when they encounter a
  PT_LOAD type segment which is not aligned to PAGE_SIZE boundary.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-21 10:26:54 +01:00
Ingo Molnar
dc1829a4c3 [PATCH] i386/x86_64: ACPI cpu_idle_wait() fix
The scheduler on Andreas Friedrich's hyperthreading system stopped
working properly: the scheduler would never move tasks to another CPU!
The lask known working kernel was 2.6.8.

After a couple of attempts to corner the bug, the following smoking gun
was found:

  BIOS reported wrong ACPI idfor the processor
  CPU#1: set_cpus_allowed(), swapper:1, 3 -> 2
   [<c0103bbe>] show_trace_log_lvl+0x34/0x4a
   [<c0103ceb>] show_trace+0x2c/0x2e
   [<c01045f8>] dump_stack+0x2b/0x2d
   [<c0116a77>] set_cpus_allowed+0x52/0xec
   [<c0101d86>] cpu_idle_wait+0x2e/0x100
   [<c0259c57>] acpi_processor_power_exit+0x45/0x58
   [<c0259752>] acpi_processor_remove+0x46/0xea
   [<c025c6fb>] acpi_start_single_object+0x47/0x54
   [<c025cee5>] acpi_bus_register_driver+0xa4/0xd3
   [<c04ab2d7>] acpi_processor_init+0x57/0x77
   [<c01004d7>] init+0x146/0x2fd
   [<c0103a87>] kernel_thread_helper+0x7/0x10

a quick look at cpu_idle_wait() shows how broken that code is
on i386: it changes the init task's affinity map but never
restores it ...

and because all userspace tasks get forked by init, they all
inherited that single-CPU affinity mask. x86_64 cloned this
bug too.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andreas Friedrich <andreas.friedrich@fujitsu-siemens.com>
Cc: Wolfgang Erig <Wolfgang.Erig@fujitsu-siemens.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-17 08:20:09 -08:00
Ingo Molnar
0796bdb7e9 [PATCH] x86_64: stack unwinder crash fix
the new dwarf2 unwinder crashes while trying to dump the stack:

  Leftover inexact backtrace:

  Unable to handle kernel paging request at ffffffff82800000 RIP:
   [<ffffffff8026cf26>] dump_trace+0x35b/0x3d2
  PGD 203027 PUD 205027 PMD 0
  Oops: 0000 [2] PREEMPT SMP
  CPU 0
  Modules linked in:
  Pid: 30, comm: khelper Not tainted 2.6.19-rc6-rt1 #11
  RIP: 0010:[<ffffffff8026cf26>]  [<ffffffff8026cf26>] dump_trace+0x35b/0x3d2
  RSP: 0000:ffff81003fb9d848  EFLAGS: 00010006
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffff805b3520 RDI: 0000000000000000
  RBP: ffffffff827ffff9 R08: ffffffff80aad000 R09: 0000000000000005
  R10: ffffffff80aae000 R11: ffffffff8037961b R12: ffff81003fb9d858
  R13: 0000000000000000 R14: ffffffff80598460 R15: ffffffff80ab1fc0
  FS:  0000000000000000(0000) GS:ffffffff806c4200(0000) knlGS:0000000000000000
  CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
  CR2: ffffffff82800000 CR3: 0000000000201000 CR4: 00000000000006e0

this crash happened because it did not sanitize the dwarf2 data it
got, and got an unaligned stack pointer - which happily walked past
the process stack (and eventually reached the end of kernel memory
and pagefaulted there) due to this naive iteration condition:

        HANDLE_STACK (((long) stack & (THREAD_SIZE-1)) != 0);

note that i386 is alot more conservative when it comes to trusting
stack pointers:

  static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
  {
         return  p > (void *)tinfo &&
                 p < (void *)tinfo + THREAD_SIZE - 3;
  }

but the x86_64 code did not take this bit of i386 code.

The fix is to align the stack pointer.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-17 08:20:09 -08:00
Andi Kleen
6b3d1a95ba [PATCH] x86-64: Fix vsyscall.c compilation on UP
Broken by earlier patch by me.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-16 13:57:03 -08:00
Eric W. Biederman
45c9953325 [PATCH] Use delayed disable mode of ioapic edge triggered interrupts
Komuro reports that ISA interrupts do not work after a disable_irq(),
causing some PCMCIA drivers to not work, with messages like

	eth0: Asix AX88190: io 0x300, irq 3, hw_addr xx:xx:xx:xx:xx:xx
	eth0: found link beat
	eth0: autonegotiation complete: 100baseT-FD selected
	eth0: interrupt(s) dropped!
	eth0: interrupt(s) dropped!
	eth0: interrupt(s) dropped!
	...

Linus Torvalds <torvalds@osdl.org> said:

  "Now, edge-triggered interrupts are a _lot_ harder to mask, because the
   Intel APIC is an unbelievable piece of sh*t, and has the edge-detect logic
   _before_ the mask logic, so if a edge happens _while_ the device is
   masked, you'll never ever see the edge ever again (unmasking will not
   cause a new edge, so you simply lost the interrupt).

   So when you "mask" an edge-triggered IRQ, you can't really mask it at all,
   because if you did that, you'd lose it forever if the IRQ comes in while
   you masked it. Instead, we're supposed to leave it active, and set a flag,
   and IF the IRQ comes in, we just remember it, and mask it at that point
   instead, and then on unmasking, we have to replay it by sending a
   self-IPI."

This trivial patch solves the problem.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: Komuro <komurojun-mbn@nifty.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-15 09:04:32 -08:00
Andi Kleen
9446868b53 [PATCH] x86-64: Fix race in exit_idle
When another interrupt happens in exit_idle the exit idle notifier
could be called an incorrect number of times.

Add a test_and_clear_bit_pda and use it handle the bit
atomically against interrupts to avoid this.

Pointed out by Stephane Eranian

Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:46 +01:00
Andi Kleen
8c131af1db [PATCH] x86-64: Fix vgetcpu when CONFIG_HOTPLUG_CPU is disabled
The vgetcpu per CPU initialization previously relied on CPU hotplug
events for all CPUs to initialize the per CPU state. That only
worked only on kernels with CONFIG_HOTPLUG_CPU enabled.  On the
others some CPUs didn't get their state initialized properly
and vgetcpu wouldn't work.

Change the initialization sequence to instead run in a normal
initcall (which runs after the normal CPU bootup) and initialize
all running CPUs there. Later hotplug CPUs are still handled
with an hotplug notifier.

This actually simplifies the code somewhat.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:46 +01:00
Andi Kleen
fa18f477d0 [PATCH] x86: Add acpi_user_timer_override option for Asus boards
Timer overrides are normally disabled on Nvidia board because
they are commonly wrong, except on new ones with HPET support.
Unfortunately there are quite some Asus boards around that
don't have HPET, but need a timer override.

We don't know yet how to handle this transparently,
but at least add a command line option to force the timer override
and let them boot.

Cc: len.brown@intel.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:46 +01:00
Magnus Damm
15803a4328 [PATCH] x86-64: setup saved_max_pfn correctly (kdump)
x86_64: setup saved_max_pfn correctly

2.6.19-rc4 has broken CONFIG_CRASH_DUMP support on x86_64. It is impossible
to read out the kernel contents from /proc/vmcore because saved_max_pfn is set
to zero instead of the max_pfn value before the user map is setup.

This happens because saved_max_pfn is initialized at parse_early_param() time,
and at this time no active regions have been registered. save_max_pfn is setup
from e820_end_of_ram(), more exact find_max_pfn_with_active_regions() which
returns 0 because no regions exist.

This patch fixes this by registering before and removing after the call
to e820_end_of_ram().

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:46 +01:00
Aaron Durbin
14f448e361 [PATCH] x86-64: Fix partial page check to ensure unusable memory is not being marked usable.
Fix partial page check in e820_register_active_regions to ensure
partial pages are
not being marked as active in the memory pool.

Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:45 +01:00
Eric W. Biederman
ec68307cc5 [PATCH] htirq: refactor so we only have one function that writes to the chip
This refactoring actually optimizes the code a little by caching the value
that we think the device is programmed with instead of reading it back from
the hardware.  Which simplifies the code a little and should speed things up a
bit.

This patch introduces the concept of a ht_irq_msg and modifies the
architecture read/write routines to update this code.

There is a minor consistency fix here as well as x86_64 forgot to initialize
the htirq as masked.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Acked-by: Bryan O'Sullivan <bos@pathscale.com>
Cc: <olson@pathscale.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08 18:29:24 -08:00
Linus Torvalds
48797ebd9e x86-64: write IO APIC irq routing entries in correct order
This is the x86-64 version of f9dadfa71b
that did the same thing on i386.

Since the "mask" bit is in the low word, when we write a new entry, we
need to write the high word first, before we potentially unmask it.

The exception is when we actually want to mask the interrupt, in which
case we want to write the low word first to make sure that the high word
doesn't change while the interrupt routing is still active.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08 10:27:54 -08:00
Linus Torvalds
6c0ffb9d2f x86-64: clean up io-apic accesses
This is just commit 130fe05dbc ported to
x86-64, for all the same reasons.  It cleans up the IO-APIC accesses in
order to then fix the ordering issues.

We move the accessor functions (that were only used by io_apic.c) out of
a header file, and use proper memory-mapped accesses rather than making
up our own "volatile" pointers.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08 10:23:03 -08:00
Andrew Morton
61ce1efe6e [PATCH] vmlinux.lds: consolidate initcall sections
Add a vmlinux.lds.h helper macro for defining the eight-level initcall table,
teach all the architectures to use it.

This is a prerequisite for a patch which performs initcall synchronisation for
multithreaded-probing.

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Added AVR32 as well ]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-27 15:34:51 -07:00
Eric W. Biederman
70a0a5357d [PATCH] x86-64: Only look at per_cpu data for online cpus.
When I generalized __assign_irq_vector I failed to pay attention
to what happens when you access a per cpu data structure for
a cpu that is not online.   It is an undefined case making any
code that does it have undefined behavior as well.

The code still needs to be able to allocate a vector across cpus
that are not online to properly handle combinations like lowest
priority interrupt delivery and cpu_hotplug.  Not that we can do
that today but the infrastructure shouldn't prevent it.

So this patch updates the places where we touch per cpu data
to only touch online cpus, it makes cpu vector allocation
an atomic operation with respect to cpu hotplug, and it updates
the cpu start code to properly initialize vector_irq so we
don't have inconsistencies.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-25 01:00:23 +02:00
Eric W. Biederman
d1752aa884 [PATCH] x86-64: Simplify the vector allocator.
There is no reason to remember a per cpu position of which vector
to try.  Keeping a global position is simpler and more likely to
result in a global vector allocation even if I don't need or require
it.  For level triggered interrupts this means we are less likely to
acknowledge another cpus irq, and cause the level triggered irq to
harmlessly refire.

This simplification makes it easier to only access data structures
of  online cpus, by having fewer special cases to deal with.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-25 01:00:22 +02:00
Muli Ben-Yehuda
cb01fc720c [PATCH] x86-64: increase PHB1 split transaction timeout
This patch increases the timeout for PCI split transactions on PHB1 on
the first Calgary to work around an issue with the aic94xx
adapter. Fixes kernel.org bugzilla #7180
(http://bugzilla.kernel.org/show_bug.cgi?id=7180)

Based on excellent debugging and a patch by Darrick J. Wong
<djwong@us.ibm.com>

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
2006-10-22 00:41:15 +02:00
Andi Kleen
aa026ede51 [PATCH] x86-64: Fix C3 timer test
There was a typo in the C3 latency test to decide of the TSC
should be used or not. It used the C2 latency threshold, not the
C3 one. Fix that.

This should fix the time on various dual core laptops.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-22 00:41:15 +02:00
Andi Kleen
e70ea8c09d [PATCH] x86-64: Revert timer routing behaviour back to 2.6.16 state
By default route the 8254 over the 8259 and only disable
it on ATI boards where this causes double timer interrupts.

This should unbreak some Nvidia boards where the timer doesn't
seem to tick of it isn't enabled in the 8259. At least one
VIA board also seemed to have a little trouble with the disabled
8259.

For 2.6.20 we'll try both dynamically without black listing, but I think
for .19 this is the safer approach because it has been already well tested
in earlier kernels. This also makes the x86-64 behaviour the same
as i386.

Command line options can change all this of course.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:03 +02:00
Vivek Goyal
dbaab49f92 [PATCH] x86-64: Overlapping program headers in physical addr space fix
o A recent change to vmlinux.ld.S file broke kexec as now resulting vmlinux
  program headers are overlapping in physical address space.

o Now all the vsyscall related sections are placed after data and after
  that mostly init data sections are placed. To avoid physical overlap
  among phdrs, there are three possible solutions.
	- Place vsyscall sections also in data phdrs instead of user
	- move vsyscal sections after init data in bss.
	- create another phdrs say data.init and move all the sections
	  after vsyscall into this new phdr.

o This patch implements the third solution.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Magnus Damm <magnus@valinux.co.jp>
Cc: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-10-21 18:37:03 +02:00
Eric W. Biederman
84f404f695 [PATCH] x86-64: Put more than one cpu in TARGET_CPUS
TARGET_CPUS is the default irq routing poicy.  It specifies which cpus the
kernel should aim an irq at.  In physflat delivery mode we can route an irq to
a single cpu.  But that doesn't mean our default policy should only be a
single cpu is allowed.

By allowing the irq routing code to select from multiple cpus this enables
systems with more irqs then we can service on a single processor to actually
work.

I just audited and tested the code and irqbalance doesn't care, and the
io_apic.c doesn't care if we have extra cpus in the mask.  Everything will use
or assume we are using the lowest numbered cpu in the mask if we can't use
them all.

So this should result in no behavior changes except on systems that need it.

Thanks for YH Lu for spotting this problem in his testing.

Cc: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:02 +02:00
Andi Kleen
8cf2c51927 [PATCH] x86: Revert new unwind kernel stack termination
Jan convinced me that it was unnecessary because the assembly stubs do
this already on the stack.

Cc: jbeulich@novell.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:02 +02:00
Eric W. Biederman
6bf2dafad1 [PATCH] x86-64: Use irq_domain in ioapic_retrigger_irq
Thanks to YH Lu for spotting this.  It appears I missed this function when I
refactored allocate_irq_vector and introduced irq_domain, with the result that
all retriggered irqs would go to cpu 0 even if we were not prepared to receive
them there.

While reviewing YH's patch I also noticed that this function was missing
locking, and since I am now reading two values from two diffrent arrays that
looks like a race we might be able to hit in the real world.

Cc: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:02 +02:00
Andi Kleen
581910e2eb [PATCH] x86-64: Revert interrupt backlink changes
They break more than they fix
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:02 +02:00
Jan Beulich
cc7d479fe5 [PATCH] x86-64: Fix ENOSYS in system call tracing
This patch:

- out of range system calls failing to return -ENOSYS under
  system call tracing

[AK: split out from another patch by Jan as separate bugfix]

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:02 +02:00
Vivek Goyal
73bb8919b3 [PATCH] x86-64: fix page align in e820 allocator
Currently some code pieces assume that address returned by find_e820_area()
are page aligned.  But looks like find_e820_area() had no such intention
and hence one might end up stomping over some of the data.  One such case
is bootmem allocator initialization code stomped over bss.

This patch modified find_e820_area() to return page aligned address.  This
might be little wasteful of memory but at the same time probably it is
easier to handle page aligned memory.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-10-21 18:37:01 +02:00
Yinghai Lu
45edfd1db0 [PATCH] x86-64: typo in __assign_irq_vector when updating pos for vector and offset
typo with cpu instead of new_cpu

Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:01 +02:00
Ingo Molnar
a460e745e8 [PATCH] genirq: clean up irq-flow-type naming
Introduce desc->name and eliminate the handle_irq_name() hack.  Add
set_irq_chip_and_handler_name() to set the flow type and name at once.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-17 08:18:45 -07:00
Venkatesh Pallipadi
83d0515bbb [CPUFREQ][4/8] acpi-cpufreq: Mark speedstep-centrino ACPI as deprecated
Mark ACPI hooks in speedstep-centrino as deprecated. Change the order in which
speedstep-centrino and acpi-cpufreq (when both are in kernel) will be
added. First driver to be tried is now acpi-cpufreq, followed by
speedstep-centrino.

Add a note in feature-removal-schedule to mark this deprecation.

Signed-off-by: Denis Sadykov <denis.m.sadykov@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2006-10-15 19:57:10 -04:00
Venkatesh Pallipadi
991528d734 ACPI: Processor native C-states using MWAIT
Intel processors starting with the Core Duo support
support processor native C-state using the MWAIT instruction.
Refer: Intel Architecture Software Developer's Manual
http://www.intel.com/design/Pentium4/manuals/253668.htm

Platform firmware exports the support for Native C-state to OS using
ACPI _PDC and _CST methods.
Refer: Intel Processor Vendor-Specific ACPI: Interface Specification
http://www.intel.com/technology/iapc/acpi/downloads/302223.htm

With Processor Native C-state, we use 'MWAIT' instruction on the processor
to enter different C-states (C1, C2, C3).  We won't use the special IO
ports to enter C-state and no SMM mode etc required to enter C-state.
Overall this will mean better C-state support.

One major advantage of using MWAIT for all C-states is, with this and
"treat interrupt as break event" feature of MWAIT, we can now get accurate
timing for the time spent in C1, C2, ..  states.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2006-10-14 00:35:39 -04:00
Ravikiran Thirumalai
734c4c6739 [PATCH] Fix build breakage with CONFIG_X86_VSMP
Kernel build breaks with CONFIG_X86_VSMP.  Probably due to some header
file cleanups in 2.6.19-rc1.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-12 12:25:27 -07:00
Eric W. Biederman
994bd4f9f5 [PATCH] x86_64 irq: Properly update vector_irq
This patch fixes my one line thinko where I was clearing
the vector_irq entries on the wrong cpus.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-12 07:37:30 -07:00
Aneesh Kumar K.V
c37e108d15 [PATCH] use struct irq_chip instead of struct hw_interrupt_type
hw_interrupt_type is deprecated in favour of struct irq_chip.

[mingo@elte.hu: do x86_64 too]
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11 11:14:14 -07:00
Eric W. Biederman
d3696cf737 [PATCH] x86_64 irq: Scream but don't die if we receive an unexpected irq
Due to code bugs or misbehaving hardware it is possible that we can
receive an interrupt that we have not mapped into a linux irq.  Calling
BUG when that happens is very rude, and if the problem is mild enough
prevents anything else from getting done.

So instead of calling BUG just scream loudly about the problem and
continue running.  We don't have enough knowledge to know which
interrupt triggered this behavior so we don't acknowledge it.  This will
likely prevent a recurrence of the problem by jamming up the works with
an unacknowledged interrupt.

If the interrupt was something important it is quite possible that
nothing productive will happen past this point.  But it is now at least
possible to keep working if the kernel can survive without the interrupt
we dropped on the floor.

Solutions like irqpoll should generally make dropped irqs non-fatal.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-09 14:51:43 -07:00
Eric W. Biederman
c7111c1318 [PATCH] x86_64 irq: Allocate a vector across all cpus for genapic_flat.
The problem we can't take advantage of lowest priority delivery mode if
the vectors are allocated for only one cpu at a time.  Nor can we work
around hardware that assumes lowest priority delivery mode is always
used with several cpus.

So this patch introduces the concept of a vector_allocation_domain.  A
set of cpus that will receive an irq on the same vector.  Currently the
code for implementing this is placed in the genapic structure so we can
vary this depending on how we are using the io_apics.

This allows us to restore the previous behaviour of genapic_flat without
removing the benefits of having separate vector allocation for large
machines.

This should also fix the problem report where a hyperthreaded cpu was
receving the irq on the wrong hyperthread when in logical delivery mode
because the previous behaviour is restored.

This patch properly records our allocation of the first 16 irqs to the
first 16 available vectors on all cpus.  This should be fine but it may
run into problems with multiple interrupts at the same interrupt level.
Except for some badly maintained comments in the code and the behaviour
of the interrupt allocator I have no real understanding of that problem.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-08 12:24:02 -07:00
Eric W. Biederman
b940d22d58 [PATCH] i386/x86_64: Remove global IO_APIC_VECTOR
Which vector an irq is assigned to now varies dynamically and is
not needed outside of io_apic.c.  So remove the possibility
of accessing the information outside of io_apic.c and remove
the silly macro that makes looking for users of irq_vector
difficult.

The fact this compiles ensures there aren't any more pieces
of the old CONFIG_PCI_MSI weirdness that I failed to remove.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-08 12:24:02 -07:00
Andrew Morton
d150ad7bd9 [PATCH] x86_64 irq_regs fix
smp_apic_timer_interrupt() needs to stack the pt_regs* for profile_tick.

If any other of those APIC interrupt handlers want to run get_irq_regs() then
their C entrypoint handlers will need the same treatment.

Cc: Andi Kleen <ak@muc.de>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-06 13:36:52 -07:00
Linus Torvalds
44aefd2706 Merge git://git.infradead.org/~dhowells/irq-2.6
* git://git.infradead.org/~dhowells/irq-2.6:
  IRQ: Maintain regs pointer globally rather than passing to IRQ handlers
  IRQ: Typedef the IRQ handler function type
  IRQ: Typedef the IRQ flow handler function type
2006-10-05 16:32:01 -07:00
Randy Dunlap
4b0ff1a94c [PATCH] x86-64: Fix compilation without CONFIG_KALLSYMS
Include linux/kallsyms.h unconditionally for print_symbol().

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-05 15:55:15 -07:00
Andi Kleen
7d0b0e8ddb [PATCH] x86-64: Annotate interrupt frame backlink in interrupt handlers
Add correct CFI annotation to the backlink on top of the interrupt stack.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:22 +02:00
Andi Kleen
0a5ace2ab0 [PATCH] x86-64: Fix FPU corruption
This reverts an earlier patch that was found to cause FPU
state corruption. I think the corruption happens because
unlazy_fpu() can cause FPU exceptions and when it happens
after the current switch some processing would affect
the state in the wrong process.

Thanks to  Douglas Crosher and Tom Hughes for testing.

Cc: jbeulich@novell.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:22 +02:00
Andi Kleen
51ec28e1b2 [PATCH] x86: Terminate the kernel stacks for the unwinder
Always make sure RIP/EIP is 0 in the registers stored on the top
of the stack of a kernel thread. This makes sure the unwinder code
won't try a fallback but knows the stack has ended.

AK: this patch is a bit mysterious. in theory they should be terminated
anyways, but it seems to fix at least one crash. Anyways double termination
probably doesn't hurt.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:22 +02:00
Jon Mason
70d666d6ae [PATCH] x86-64: Calgary IOMMU: print PCI bus numbers in hex
Make the references to the bus number in hex instead of decimal, as
that is the way that lspci prints out the bus numbers.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:21 +02:00
Jon Mason
d8d2bedf60 [PATCH] x86-64: Calgary IOMMU: Update Jon's contact info
Also add copyright for work done after leaving IBM.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:21 +02:00
Jon Mason
76fd231717 [PATCH] x86-64: Calgary IOMMU: Fix off by one when calculating register space location
The purpose of the code being modified is to determine the location
of the calgary chip address space.  This is done by a magical formula
of FE0MB-8MB*OneBasedChassisNumber+1MB*(RioNodeId-ChassisBase) to
find the offset where BIOS puts it.  In this formula,
OneBasedChassisNumber corresponds to the NUMA node, and rionodeid is
always 2 or 3 depending on which chip in the system it is.  The
problem was that we had an off by one error that caused us to account
some busses to the wrong chip and thus give them the wrong address
space.

Fixes RH bugzilla #203971.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-bu: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:21 +02:00
Jon Mason
dedc9937e8 [PATCH] x86-64: Calgary IOMMU: deobfuscate calgary_init
calgary_init's for loop does not correspond to the actual device being
checked, which makes its upperbound check for array overflow useless.
Changing this to a do-while loop is the correct way of doing this.
There should be no possibility of spinning forever in this loop, as
pci_get_device states that it will go through all iterations, then
return NULL (thus breaking the loop).

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:21 +02:00
David Howells
7d12e780e0 IRQ: Maintain regs pointer globally rather than passing to IRQ handlers
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.

The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around.  On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).

Where appropriate, an arch may override the generic storage facility and do
something different with the variable.  On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.

Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions.  Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller.  A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.

I've build this code with allyesconfig for x86_64 and i386.  I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.

This will affect all archs.  Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:

	struct pt_regs *old_regs = set_irq_regs(regs);

And put the old one back at the end:

	set_irq_regs(old_regs);

Don't pass regs through to generic_handle_irq() or __do_IRQ().

In timer_interrupt(), this sort of change will be necessary:

	-	update_process_times(user_mode(regs));
	-	profile_tick(CPU_PROFILING, regs);
	+	update_process_times(user_mode(get_irq_regs()));
	+	profile_tick(CPU_PROFILING);

I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().

Some notes on the interrupt handling in the drivers:

 (*) input_dev() is now gone entirely.  The regs pointer is no longer stored in
     the input_dev struct.

 (*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking.  It does
     something different depending on whether it's been supplied with a regs
     pointer or not.

 (*) Various IRQ handler function pointers have been moved to type
     irq_handler_t.

Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
2006-10-05 15:10:12 +01:00
Linus Torvalds
fefd26b3b8 Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/configh
* master.kernel.org:/pub/scm/linux/kernel/git/davej/configh:
  Remove all inclusions of <linux/config.h>

Manually resolved trivial path conflicts due to removed files in
the sound/oss/ subdirectory.
2006-10-04 09:59:57 -07:00
Eric W. Biederman
95d77884c7 [PATCH] htirq: tidy up the htirq code
This moves the declarations for the architecture helpers into
include/linux/htirq.h from the generic include/linux/pci.h.  Hopefully this
will make this distinction clearer.

htirq.h is included where it is needed.

The dependency on the msi code is fixed and removed.

The Makefile is tidied up.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:30 -07:00
Eric W. Biederman
3b7d1921f4 [PATCH] msi: refactor and move the msi irq_chip into the arch code
It turns out msi_ops was simply not enough to abstract the architecture
specific details of msi.  So I have moved the resposibility of constructing
the struct irq_chip to the architectures, and have two architecture specific
functions arch_setup_msi_irq, and arch_teardown_msi_irq.

For simple architectures those functions can do all of the work.  For
architectures with platform dependencies they can call into the appropriate
platform code.

With this msi.c is finally free of assuming you have an apic, and this
actually takes less code.

The helpers for the architecture specific code are declared in the linux/msi.h
to keep them separate from the msi functions used by drivers in linux/pci.h

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
8b955b0ddd [PATCH] Initial generic hypertransport interrupt support
This patch implements two functions ht_create_irq and ht_destroy_irq for
use by drivers.  Several other functions are implemented as helpers for
arch specific irq_chip handlers.

The driver for the card I tested this on isn't yet ready to be merged.
However this code is and hypertransport irqs are in use in a few other
places in the kernel.  Not that any of this will get merged before 2.6.19

Because the ipath-ht400 is slightly out of spec this code will need to be
generalized to work there.

I think all of the powerpc uses are for a plain interrupt controller in a
chipset so support for native hypertransport devices is a little less
interesting.

However I think this is a half way decent model on how to separate arch
specific and generic helper code, and I think this is a functional model of
how to get the architecture dependencies out of the msi code.

[akpm@osdl.org: Kconfig fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
cd1182f56a [PATCH] genirq: x86_64 irq: Kill irq compression
With more irqs in the system we don't need this.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
f023d764cc [PATCH] genirq: x86_64 irq: Kill gsi_irq_sharing
After raising the number of irqs the system supports this function is no
longer necessary.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
550f2299ac [PATCH] genirq: x86_64 irq: make vector_irq per cpu
This refactors the irq handling code to make the vectors a per cpu resource so
the same vector number can be simultaneously used on multiple cpus for
different irqs.

This should make systems that were hitting limits on the total number of irqs
much more livable.

[akpm@osdl.org: build fix]
[akpm@osdl.org: __target_IO_APIC_irq is unneeded on UP]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
e500f57436 [PATCH] genirq: x86_64 irq: Make the external irq handlers report their vector, not the irq number
This is a small pessimization but it paves the way for making this information
per cpu.  Which allows the the maximum number of IRQS to become NR_CPUS*224.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:28 -07:00
Eric W. Biederman
04b9267b15 [PATCH] genirq: x86_64 irq: Remove the msi assumption that irq == vector
This patch removes the change in behavior of the irq allocation code when
CONFIG_PCI_MSI is defined.  Removing all instances of the assumption that irq
== vector.

create_irq is rewritten to first allocate a free irq and then to assign that
irq a vector.

assign_irq_vector is made static and the AUTO_ASSIGN case which allocates an
vector not bound to an irq is removed.

The ioapic vector methods are removed, and everything now works with irqs.

The definition of NR_IRQS no longer depends on CONFIG_PCI_MSI

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:28 -07:00
Eric W. Biederman
589e367f9b [PATCH] genirq: x86_64 irq: Move msi message composition into io_apic.c
This removes the hardcoded assumption that irq == vector in the msi
composition code, and it allows the msi message composition to setup logical
mode, or lowest priorirty delivery mode as we do for other apic interrupts,
and with the same selection criteria.

Basically this moves the problem of what is in the msi message into the
architecture irq management code where it belongs.  Not in a generic layer
that doesn't have enough information to compose msi messages properly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:28 -07:00
Eric W. Biederman
c4fa0bbf38 [PATCH] genirq: x86_64 irq: Dynamic irq support
The current implementation of create_irq() is a hack but it is the current
hack that msi.c uses, and unfortunately the ``generic'' apic msi ops depend on
this hack.  Thus we are this hack of assuming irq == vector until the
depencencies in the generic irq code are removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:28 -07:00
Eric W. Biederman
0be6652f1e [PATCH] genirq: x86_64 irq: Reenable migrating irqs to other cpus
In the latest changes the code for migrating x86_64 irqs was dropped.  This
reads it in a fashion that will work even if we change the vector on level
triggered irqs when we migrate them.

[akpm@osdl.org: build fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:26 -07:00
Ingo Molnar
f29bd1ba68 [PATCH] genirq: convert the x86_64 architecture to irq-chips
This patch converts all the x86_64 PIC controllers layers to the new and
simpler irq-chip interrupt handling layer.

[mingo@elte.hu: The patch also enables the fasteoi handler for x86_64]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:25 -07:00
Dave Jones
038b0a6d8d Remove all inclusions of <linux/config.h>
kbuild explicitly includes this at build time.

Signed-off-by: Dave Jones <davej@redhat.com>
2006-10-04 03:38:54 -04:00
Andrew Morton
0e4a523fa3 [PATCH] revert "insert IOAPIC(s) and Local APIC into resource map"
Commit 54dbc0c9eb is causing various
people's machines to fail to map PCI resources.

Revert it in preparation for addressing the show-APICs-in-/proc/iomem
requirement in a different manner.

Cc: Aaron Durbin <adurbin@google.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 19:46:18 -07:00
Arnd Bergmann
3db03b4afb [PATCH] rename the provided execve functions to kernel_execve
Some architectures provide an execve function that does not set errno, but
instead returns the result code directly.  Rename these to kernel_execve to
get the right semantics there.  Moreover, there is no reasone for any of these
architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so
remove these right away.

[akpm@osdl.org: build fix]
[bunk@stusta.de: build fix]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:23 -07:00
Serge E. Hallyn
96b644bdec [PATCH] namespaces: utsname: use init_utsname when appropriate
In some places, particularly drivers and __init code, the init utsns is the
appropriate one to use.  This patch replaces those with a the init_utsname
helper.

Changes: Removed several uses of init_utsname().  Hope I picked all the
	right ones in net/ipv4/ipconfig.c.  These are now changed to
	utsname() (the per-process namespace utsname) in the previous
	patch (2/7)

[akpm@osdl.org: CIFS fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn
e9ff3990f0 [PATCH] namespaces: utsname: switch to using uts namespaces
Replace references to system_utsname to the per-process uts namespace
where appropriate.  This includes things like uname.

Changes: Per Eric Biederman's comments, use the per-process uts namespace
	for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn
0437eb594e [PATCH] nsproxy: move init_nsproxy into kernel/nsproxy.c
Move the init_nsproxy definition out of arch/ into kernel/nsproxy.c.  This
avoids all arches having to be updated.  Compiles and boots on s390.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Serge E. Hallyn
ab516013ad [PATCH] namespaces: add nsproxy
This patch adds a nsproxy structure to the task struct.  Later patches will
move the fs namespace pointer into this structure, and introduce a new utsname
namespace into the nsproxy.

The vserver and openvz functionality, then, would be implemented in large part
by virtualizing/isolating more and more resources into namespaces, each
contained in the nsproxy.

[akpm@osdl.org: build fix]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
bibo,mao
99219a3fbc [PATCH] kretprobe spinlock deadlock patch
kprobe_flush_task() possibly calls kfree function during holding
kretprobe_lock spinlock, if kfree function is probed by kretprobe that will
incur spinlock deadlock.  This patch moves kfree function out scope of
kretprobe_lock.

Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:16 -07:00
bibo,mao
62c27be0dd [PATCH] kprobe whitespace cleanup
Whitespace is used to indent, this patch cleans up these sentences by
kernel coding style.

Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:16 -07:00
Atsushi Nemoto
8ef386092d [PATCH] kill wall_jiffies
With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies.
So we can kill wall_jiffies completely.

This is just a cleanup and logically should not change any real behavior
except for one thing: RTC updating code in (old) ppc and xtensa use a
condition "jiffies - wall_jiffies == 1".  This condition is never met so I
suppose it is just a bug.  I just remove that condition only instead of
kill the whole "if" block.

[heiko.carstens@de.ibm.com: s390 build fix and cleanup]
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:27 -07:00
Andi Kleen
34596dc9e5 [PATCH] Define vsyscall cache as blob to make clearer that user space shouldn't use it
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Vivek Goyal
120b114237 [PATCH] Re-positioning the bss segment
[AK: This apparently broke some systems, but we need it to fix
a compile problem with old binutils and in theory the patch
is correct. So let's trying reenabling it again.]

o Currently bss segment is being placed somewhere in the middle (after .data)
  section and after bss lots of init section and data sections are coming.
  Is it intentional?

o One side affect of placing bss in the middle is that objcopy keeps the
  bss in raw binary image (vmlinux.bin) hence unnecessarily increasing
  the size of raw binary image. (In my case ~600K). It also increases
  the size of generated bzImage, though the increase is very small
  (896 bytes), probably a very high compression ratio for stream
  of zeros.

o This patch moves the bss at the end hence reducing the size of
  bzImage by 896 bytes and size of vmlinux.bin by 600K.

o This change benefits in the context of relocatable kernel patches. If
  kernel bss is not part of compressed data (vmlinux.bin) then it does
  not have to be decompressed and this area can be used by the decompressor
  for its execution hence keeping the memory requirements bounded and
  decompressor code does not stomp over any other data loaded beyond
  kernel image (As might be the case with bootloaders like kexec).

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
9d0ef4fd61 [PATCH] Use ARRAY_SIZE in setup.c
Based on i386 patch from Bjorn.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
29cbc78b90 [PATCH] x86: Clean up x86 NMI sysctls
Use prototypes in headers
Don't define panic_on_unrecovered_nmi for all architectures

Cc: dzickus@redhat.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
013bf2c50e [PATCH] Refactor some duplicated code in mpparse.c
No logic changes

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
d802ab981d [PATCH] Document iommu=panic
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
ded318ec80 [PATCH] Fix broken indentation in iommu_setup
No functional changes; only white space.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
ece6684012 [PATCH] Allow disabling DAC using command line options
Might or might not work around some reported bugs on VIA systems.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Atsushi Nemoto
3171a0305d [PATCH] simplify update_times (avoid jiffies/jiffies_64 aliasing problem)
Pass ticks to do_timer() and update_times(), and adjust x86_64 and s390
timer interrupt handler with this change.

Currently update_times() calculates ticks by "jiffies - wall_jiffies", but
callers of do_timer() should know how many ticks to update.  Passing ticks
get rid of this redundant calculation.  Also there are another redundancy
pointed out by Martin Schwidefsky.

This cleanup make a barrier added by
5aee405c66 needless.  So this patch removes
it.

As a bonus, this cleanup make wall_jiffies can be removed easily, since now
wall_jiffies is always synced with jiffies.  (This patch does not really
remove wall_jiffies.  It would be another cleanup patch)

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:15 -07:00
Rolf Eike Beer
d6bd3a39f7 [PATCH] Move valid_dma_direction() from x86_64 to generic code
As suggested by Muli Ben-Yehuda this function is moved to generic code as
may be useful for all archs.

[akpm@osdl.org: fix]
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:10 -07:00
Mel Gorman
5cb248abf5 [PATCH] Have x86_64 use add_active_range() and free_area_init_nodes
Size zones and holes in an architecture independent manner for x86_64.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Linus Torvalds
b278240839 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
  [PATCH] Don't set calgary iommu as default y
  [PATCH] i386/x86-64: New Intel feature flags
  [PATCH] x86: Add a cumulative thermal throttle event counter.
  [PATCH] i386: Make the jiffies compares use the 64bit safe macros.
  [PATCH] x86: Refactor thermal throttle processing
  [PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
  [PATCH] Fix unwinder warning in traps.c
  [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
  [PATCH] x86: Move direct PCI scanning functions out of line
  [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
  [PATCH] Don't leak NT bit into next task
  [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
  [PATCH] Fix some broken white space in ia32_signal.c
  [PATCH] Initialize argument registers for 32bit signal handlers.
  [PATCH] Remove all traces of signal number conversion
  [PATCH] Don't synchronize time reading on single core AMD systems
  [PATCH] Remove outdated comment in x86-64 mmconfig code
  [PATCH] Use string instructions for Core2 copy/clear
  [PATCH] x86: - restore i8259A eoi status on resume
  [PATCH] i386: Split multi-line printk in oops output.
  ...
2006-09-26 13:07:55 -07:00
Rafael J. Wysocki
75534b50cc [PATCH] Change the name of pagedir_nosave
The name of the pagedir_nosave variable does not make sense any more, so it
seems reasonable to change it to something more meaningful.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:49:01 -07:00
Rafael J. Wysocki
e8eff5ac29 [PATCH] Make swsusp avoid memory holes and reserved memory regions on x86_64
On x86_64 machines with more than 2 GB of RAM there are large memory gaps
(with no corresponding kernel virtual addresses) and reserved memory
regions between areas of usable physical RAM.  Moreover, if CONFIG_FLATMEM
is set, they appear within the normal zone.  swsusp should not try to save
them, so the corresponding page structs have to be marked as 'nosave'.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:58 -07:00
Andrew Morton
a3bc0dbc81 [PATCH] smp_call_function_single() cleanup
If we're going to implement smp_call_function_single() on three architecture
with the same prototype then it should have a declaration in a
non-arch-specific header file.

Move it into <linux/smp.h>.

Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:56 -07:00
Clemens Ladisch
1447c27d38 [PATCH] hpet rtc emulation: add watchdog timer
To prevent the emulated RTC timer from stopping when interrupts are delayed
for too long, disable interrupts around all of the register initialization,
and check that the interrupt handler did not schedule the next interrupt in
the past.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Robert Picco <Robert.Picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:54 -07:00
Dave Jones
dcf10307c3 [PATCH] i386/x86-64: New Intel feature flags
Add supplemental SSE3 instructions flag, and Direct Cache Access flag.
As described in "Intel Processor idenfication and the CPUID instruction
AP485 Sept 2006"

AK: also added for x86-64

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:42 +02:00
Dmitriy Zavin
3222b36f46 [PATCH] x86: Add a cumulative thermal throttle event counter.
The counter is exported to /sys that keeps track of the
number of thermal events, such that the user knows how bad the
thermal problem might be (since the logging to syslog and mcelog
is rate limited).

AK: Fixed cpu hotplug locking

Signed-off-by: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:42 +02:00
Dmitriy Zavin
15d5f83983 [PATCH] x86: Refactor thermal throttle processing
Refactor the event processing (syslog messaging and rate limiting)
into separate file therm_throt.c. This allows consistent reporting
of CPU thermal throttle events.

After ACK'ing the interrupt, if the event is current, the user
(p4.c/mce_intel.c) calls therm_throt_process to log (and rate limit)
the event. If that function returns 1, the user has the option to log
things further (such as to mce_log in x86_64).

AK: minor cleanup

Signed-off-by: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:42 +02:00
Andi Kleen
0637a70a5d [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
Some buggy systems can machine check when config space accesses
happen for some non existent devices.  i386/x86-64 do some early
device scans that might trigger this. Allow pci=noearly to disable
this. Also when type 1 is disabling also don't do any early
accesses which are always type1.

This moves the pci= configuration parsing to be a early parameter.
I don't think this can break anything because it only changes
a single global that is only used by PCI.

Cc: gregkh@suse.de
Cc: Trammell Hudson <hudson@osresearch.net>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
f157cbb1eb [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
This is useful on systems with broken PCI bus. Affects various
scans in x86-64 and i386's early ACPI quirk scan.

Cc: gregkh@suse.de
Cc: len.brown@intel.com
Cc: Trammell Hudson <hudson@osresearch.net>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
658fdbef66 [PATCH] Don't leak NT bit into next task
SYSENTER can cause a NT to be set which might cause crashes on the IRET
in the next task.

Following similar i386 patch from Linus.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Jan Beulich
adf1423698 [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
Current gcc generates calls not jumps to noreturn functions. When that happens the
return address can point to the next function, which confuses the unwinder.

This patch works around it by marking asynchronous exception
frames in contrast normal call frames in the unwind information.  Then teach
the unwinder to decode this.

For normal call frames the unwinder now subtracts one from the address which avoids
this problem.  The standard libgcc unwinder uses the same trick.

It doesn't include adjustment of the printed address (i.e. for the original
example, it'd still be kernel_math_error+0 that gets displayed, but the
unwinder wouldn't get confused anymore.

This only works with binutils 2.6.17+ and some versions of H.J.Lu's 2.6.16
unfortunately because earlier binutils don't support .cfi_signal_frame

[AK: added automatic detection of the new binutils and wrote description]

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
dd54a11004 [PATCH] Remove all traces of signal number conversion
This was old code that was needed for iBCS and x86-64 never supported that.

Pointed out by Albert Cahalan
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
2049336f60 [PATCH] Don't synchronize time reading on single core AMD systems
We do some additional CPU synchronization in gettimeofday et.al. to make
sure the time stamps are always monotonic over multiple CPUs. But on
single core systems that is not needed. So don't do it.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
27fbe5b28a [PATCH] Use string instructions for Core2 copy/clear
It is faster than using a unrolled loop for the use cases the kernel
cares about (cached, sizes typically < 4K)

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Matthew Garrett
35d534a3ff [PATCH] x86: - restore i8259A eoi status on resume
Got it. i8259A_resume calls init_8259A(0) unconditionally, even if
auto_eoi has been set. Keep track of the current status and restore that
on resume. This fixes it for AMD64 and i386.

Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Aaron Durbin
56dd669a13 [PATCH] Insert GART region into resource map
Patch inserts the GART region into the iomem resource map. The GART will then
be visible within /proc/iomem. It will also allow for other users
utilizing the GART to subreserve the region (agp or IOMMU).

Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Andi Kleen
a15da49deb [PATCH] Fix idle notifiers
Previously exit_idle would be called more often than enter_idle

Now instead of using complicated tests just keep track of it
using the per CPU variable as a flip flop.  I moved the idle state into the
PDA to make the access more efficient.

Original bug report and an initial patch from Stephane Eranian,
but redone by AK.

Cc: Stephane Eranian <eranian@hpl.hp.com>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Andi Kleen
b38337a624 [PATCH] Mark per cpu data initialization __initdata again
Before 2.6.16 this was changed to work around code that accessed
CPUs not in the possible map. But that code should be all fixed now,
so mark it __initdata again.
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Andi Kleen
3022d734a5 [PATCH] Fix zeroing on exception in copy_*_user
- Don't zero for __copy_from_user_inatomic following i386.
This will prevent spurious zeros for parallel file system writers when
one does a exception
- The string instruction version didn't zero the output on
exception. Oops.

Also I cleaned up the code a bit while I was at it and added a minor
optimization to the string instruction path.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:39 +02:00
adurbin@google.com
54dbc0c9eb [PATCH] insert IOAPIC(s) and Local APIC into resource map
This patch places the IOAPIC(s) and the Local APIC specified by ACPI
tables into the resource map. The APICs will then be visible within
/proc/iomem

Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:39 +02:00
Andi Kleen
7b0bda74f7 [PATCH] Fix a PDA warning uncovered by the new type checking
Fix
linux/arch/x86_64/kernel/process.c: In function __switch_to:
linux/arch/x86_64/kernel/process.c:626: warning: assignment makes integer from pointer without a cast

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:39 +02:00
Andi Kleen
96e540492a [PATCH] Fix a irqcount comment in entry.S
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:39 +02:00
Arjan van de Ven
0a42540580 [PATCH] Add the canary field to the PDA area and the task struct
This patch adds the per thread cookie field to the task struct and the PDA.
Also it makes sure that the PDA value gets the new cookie value at context
switch, and that a new task gets a new cookie at task creation time.

Signed-off-by: Arjan van Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Andi Kleen
e8c7391de4 [PATCH] Don't use kernel_text_address in oops context
Because it can take spinlocks.

Suggested by Mathieu Desnoyers

Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Magnus Damm
4bfaaef01a [PATCH] Avoid overwriting the current pgd (V4, x86_64)
kexec: Avoid overwriting the current pgd (V4, x86_64)

This patch upgrades the x86_64-specific kexec code to avoid overwriting the
current pgd. Overwriting the current pgd is bad when CONFIG_CRASH_DUMP is used
to start a secondary kernel that dumps the memory of the previous kernel.

The code introduces a new set of page tables. These tables are used to provide
an executable identity mapping without overwriting the current pgd.

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Keith Owens
f574164491 [PATCH] Remove most of the special cases for the debug IST stack
Remove most of the special cases for the debug IST stack.  This is a
follow on clean up patch, it requires the bug fix patch that adds
orig_ist.

Signed-off-by: Keith Owens <kaos@ocs.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Andi Kleen
53ee11ae0d [PATCH] Optimize PDA accesses slightly
Based on a idea by Jeremy Fitzhardinge:

Replace the volatiles and memory clobbers in the PDA access with
telling gcc about access to a proxy PDA structure that doesn't
actually exist. But the dummy accesses give a defined ordering for
read/write accesses.

Also add some memory barriers to the early GS initialization to
make sure no PDA access is moved before it.

Advantage is some .text savings (probably most from better
code for accessing "current"):

   text    data     bss     dec     hex filename
4845647 1223688  615864 6685199  66020f vmlinux
4837780 1223688  615864 6677332  65e354 vmlinux-pda

1.2% smaller code

Cc:  Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Ian Campbell
f2a9e1dec2 [PATCH] Put .note.* sections into a PT_NOTE segment
This patch updates x86_64 linker script to pack any .note.* sections
into a PT_NOTE segment in the output file.

To do this, we tell ld that we need a PT_NOTE segment.  This requires
us to start explicitly mapping sections to segments, so we also need
to explicitly create PT_LOAD segments for text and data, and map the
sections to them appropriately.  Fortunately, each section will
default to its previous section's segment, so it doesn't take many
changes to vmlinux.lds.S.

The corresponding change is already made for i386 in -mm and I'd like
this patch to join it. The section to segment mappings do change as do
the segment flags so some time in -mm would be good for that reason as
well, just in case.

In particular .data and .bss move from the text segment to the data
segment and .data.cacheline_aligned .data.read_mostly are put in the
data segment instead of a separate one.

I think that it would be possible to exactly match the existing section
to segment mapping and flags but it would be a more intrusive change and
I'm not sure there is a reason for the existing layout other than it is
what you get by default if you don't explicitly specify something else.
If there is a reason for the existing layout then I will of course make
the more intrusive change. If there is no reason we could probably drop
the executable or writable flags from some segments but I don't know how
much attention is paid to them anyway so it might not be worth the
effort.

The vsyscall related sections need to go in a different segment to the
normal data segment and so I invented a "user" segment to contain them.
I believe this should appear to be another data segment as far as the
kernel is concerned so the flags are setup accordingly.

The notes will be used in the Xen paravirt_ops backend to provide
additional information to the domain builder. I am in the process of
converting the xen-unstable kernels and tools over to this scheme at the
moment to support this in the future.

It has been suggested to me that the notes segment should have flags 0
(i.e. not readable) since it is only used by the loader and is not used
at runtime. For now I went with a readable segment since that is what
the i386 patch uses.

AK: dropped NOTES addition right now because the needed infrastructure
for that is not merged yet

Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Eric W. Biederman
26374c7b7d [PATCH] Reload CS when startup_64 is used.
In long mode the %cs is largely a relic.  However there are a few cases
like iret where it matters that we have a valid value.  Without this
patch it is possible to enter the kernel in startup_64 without setting
%cs to a valid value.  With this patch we don't care what %cs value
we enter the kernel with, so long as the cs shadow register indicates
it is a privileged code segment.

Thanks to Magnus Damm for finding this problem and posting the
first workable patch.  I have moved the jump to set %cs down a
few instructions so we don't need to take an extra jump.  Which
keeps the code simpler.

Signed-of-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Andi Kleen
8380aabb99 [PATCH] Remove non e820 fallbacks in high level code
Drop support for non e820 BIOS calls to get the memory map.

The boot assembler code still has some support, but not the C code now.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
7a0a2dff1c [PATCH] Add a missing check for irq flags tracing in NMI
NMIs are not supposed to track the irq flags, but TRACE_IRQS_IRETQ
did it anyways. Add a check.

Cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
aecc63615e [PATCH] Fix coding style and output of the mptable parser
Give the printks a consistent prefix.
Add some missing white space.

Cc: len.brown@intel.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
e4251e130d [PATCH] Remove some cruft in apic id checking during processor setup
- Remove a define that was used only once
- Remove the too large APIC ID check because we always support
the full 8bit range of APICs.
- Restructure code a bit to be simpler.

Cc: len.brown@intel.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
f2c2cca3ac [PATCH] Remove APIC version/cpu capability mpparse checking/printing
ACPI went to great trouble to get the APIC version and CPU capabilities
of different CPUs before passing them to the mpparser. But all
that data was used was to print it out.  Actually it even faked some data
based on the boot cpu, not on the actual CPU being booted.

Remove all this code because it's not needed.

Cc: len.brown@intel.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
151f8cc116 [PATCH] Remove safe_smp_processor_id()
And replace all users with ordinary smp_processor_id.  The function
was originally added to get some basic oops information out even
if the GS register was corrupted. However that didn't
work for some anymore because printk is needed to print the oops
and it uses smp_processor_id() already. Also GS register corruptions
are not particularly common anymore.

This also helps the Xen port which would otherwise need to
do this in a special way because it can't access the local APIC.

Cc: Chris Wright <chrisw@sous-sol.org>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Rafael J. Wysocki
34464a5b89 [PATCH] Detect clock skew during suspend
Detect the situations in which the time after a resume from disk would
be earlier than the time before the suspend and prevent them from
happening on x86_64.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
dbf9272e86 [PATCH] Don't force reserve the 640k-1MB range
From i386 x86-64 inherited code to force reserve the 640k-1MB area.
That was needed on some old systems.

But we generally trust the e820 map to be correct on 64bit systems
and mark all areas that are not memory correctly.

This patch will allow to use the real memory in there.

Or rather the only way to find out if it's still needed is to
try. So far I'm optimistic.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:36 +02:00
Magnus Damm
ed77504b20 [PATCH] mark init_amd() as __cpuinit
The init_amd() function is only called from identify_cpu() which is already
marked as __cpuinit. So let's mark it as __cpuinit.

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:36 +02:00
Andrew Morton
abf0f10948 [PATCH] wire up oops_enter()/oops_exit()
Implement pause_on_oops() on x86_64.

AK: I redid the patch to do the oops_enter/exit in the existing
oops_begin()/end(). This makes it much shorter.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:36 +02:00
Arjan van de Ven
e07e23e1fd [PATCH] non lazy "sleazy" fpu implementation
Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
context switch a trap is taken for the first FPU use to restore the FPU
context lazily.  This is of course great for applications that have very
sporadic or no FPU use (since then you avoid doing the expensive
save/restore all the time).  However for very frequent FPU users...  you
take an extra trap every context switch.

The patch below adds a simple heuristic to this code: After 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch.  If the app indeed uses the FPU, the
trap is avoided.  (the chance of the 6th time slice using FPU after the
previous 5 having done so are quite high obviously).

After 256 switches, this is reset and lazy behavior is returned (until
there are 5 consecutive ones again).  The reason for this is to give apps
that do longer bursts of FPU use still the lazy behavior back after some
time.

[akpm@osdl.org: place new task_struct field next to jit_keyring to save space]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:36 +02:00
Eric W. Biederman
ba4d40bb5c [PATCH] Auto size the per cpu area.
Now for a completely different but trivial approach.
I just boot tested it with 255 CPUS and everything worked.

Currently everything (except module data) we place in
the per cpu area we know about at compile time.  So
instead of allocating a fixed size for the per_cpu area
allocate the number of bytes we need plus a fixed constant
for to be used for modules.

It isn't perfect but it is much less of a pain to
work with than what we are doing now.

AK: fixed warning

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:35 +02:00
Andi Kleen
2717941c6a [PATCH] Make boot_param_data pure BSS
Since it's all zero.

Actually I think gcc 4+ will do that automatically, but earlier compilers won't
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:35 +02:00
Dimitri Sivanich
cbf9b4bb76 [PATCH] X86_64 monotonic_clock goes backwards
I've noticed some erratic behavior while testing the X86_64 version
of monotonic_clock().

While spinning in a loop reading monotonic clock values (pinned to a
single cpu) I noticed that the difference between subsequent values
occasionally went negative (time going backwards).

I found that in the following code:
                this_offset = get_cycles_sync();
                /* FIXME: 1000 or 1000000? */
-->             offset = (this_offset - last_offset)*1000 / cpu_khz;
        }
        return base + offset;

the offset sometimes turns out to be 0, even though
this_offset > last_offset.

+Added fix From: Toyo Abe <toyoa@mvista.com>

The x86_64-mm-monotonic-clock.patch in 2.6.18-rc4-mm2 made a change to
the updating of monotonic_base. It now uses cycles_2_ns().

I suggest that a set_cyc2ns_scale() should be done prior to the setup_irq().
Because cycles_2_ns() can be called from the timer ISR right after the irq0
is enabled.

Signed-off-by: Toyo Abe <toyoa@mvista.com>
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Prasanna S.P
d28c4393a7 [PATCH] x86: error_code is not safe for kprobes
This patch moves the entry.S:error_entry to .kprobes.text section,
since code marked unsafe for kprobes jumps directly to entry.S::error_entry,
that must be marked unsafe as well.
This patch also moves all the ".previous.text" asm directives to ".previous"
for kprobes section.

AK: Following a similar i386 patch from Chuck Ebbert
AK: Also merged Jeremy's fix in.

+From: Jeremy Fitzhardinge <jeremy@goop.org>

KPROBE_ENTRY does a .section .kprobes.text, and expects its users to
do a .previous at the end of the function.

Unfortunately, if any code within the function switches sections, for
example .fixup, then the .previous ends up putting all subsequent code
into .fixup.  Worse, any subsequent .fixup code gets intermingled with
the code its supposed to be fixing (which is also in .fixup).  It's
surprising this didn't cause more havok.

The fix is to use .pushsection/.popsection, so this stuff nests
properly.  A further cleanup would be to get rid of all
.section/.previous pairs, since they're inherently fragile.

+From: Chuck Ebbert <76306.1226@compuserve.com>

Because code marked unsafe for kprobes jumps directly to
entry.S::error_code, that must be marked unsafe as well.
The easiest way to do that is to move the page fault entry
point to just before error_code and let it inherit the same
section.

Also moved all the ".previous" asm directives for kprobes
sections to column 1 and removed ".text" from them.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Andi Kleen
be7a91709b [PATCH] Check for end of stack trace before falling back
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Andi Kleen
c0b766f13d [PATCH] Merge stacktrace and show_trace
This unifies the standard backtracer and the new stacktrace
in memory backtracer. The standard one is converted to use callbacks
and then reimplement stacktrace using new callbacks.

The main advantage is that stacktrace can now use the new dwarf2 unwinder
and avoid false positives in many cases.

I kept it simple to make sure the standard backtracer stays reliable.

Cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Andi Kleen
b7f5e3c774 [PATCH] Don't access the APIC in safe_smp_processor_id when it is not mapped yet
Lockdep can call the dwarf2 unwinder early, and the dwarf2 code
uses safe_smp_processor_id which tries to access the local APIC page.
But that doesn't work before the APIC code has set up its fixmap.

Check for this case and always return boot cpu then.

Cc: jbeulich@novell.com
Cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Andi Kleen
5a1b3999d6 [PATCH] x86: Some preparationary cleanup for stack trace
- Remove unused all_contexts parameter
No caller used it
- Move skip argument into the structure (needed for
followon patches)

Cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Muli Ben-Yehuda
4ea8a5d8b5 [PATCH] Calgary IOMMU: eradicate sole remaining 80 chars per line offender
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Muli Ben-Yehuda
4ccf4ae314 [PATCH] remove tce_cache_blast_stress()
tce_cache_blast_stress was useful during bringup to stress the IOMMU's
cache flushing. Now that we quiesce DMAs on every cache flush, using
_stress() brings the machine down to its knees once you put it under
load. Remove this debug / bringup code that isn't useful anymore
completely.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Muli Ben-Yehuda
796e4390e0 [PATCH] only verify the allocation bitmap if CONFIG_IOMMU_DEBUG is on
Introduce new function verify_bit_range(). Define two versions, one
for CONFIG_IOMMU_DEBUG enabled and one for disabled. Previously we
were checking that the bitmap was consistent every time we allocated
or freed an entry in the TCE table, which is good for debugging but
incurs an unnecessary penalty on non debug builds.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Muli Ben-Yehuda
de684652f3 [PATCH] print whether CONFIG_IOMMU_DEBUG is enabled
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Chuck Ebbert
2ade2920dc [PATCH] i386/x86-64: rename is_at_popf(), add iret to tests and fix
is_at_popf() needs to test for the iret instruction as well as
popf.  So add that test and rename it to is_setting_trap_flag().

Also change max insn length from 16 to 15 to match reality.

LAHF / SAHF can't affect TF, so the comment in x86_64 is removed.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Fernando Luis Vázquez Cao
2b94ab2fd5 [PATCH] Replace local_save_flags+local_irq_disable with
The combination of "local_save_flags" and "local_irq_disable" seems to be
equivalent to "local_irq_save" (see code snips below). Consequently, replace
occurrences of local_save_flags+local_irq_disable with local_irq_save.

* local_irq_save
#define raw_local_irq_save(flags) \
                do { (flags) = __raw_local_irq_save(); } while (0)

static inline unsigned long __raw_local_irq_save(void)
{
        unsigned long flags = __raw_local_save_flags();

        raw_local_irq_disable();

        return flags;
}

* local_save_flags
#define raw_local_save_flags(flags) \
                do { (flags) = __raw_local_save_flags(); } while (0)

Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
131cfd7bd5 [PATCH] Add sparse annotation to vsyscall.c
Fixes

linux/arch/x86_64/kernel/vsyscall.c:276:7: warning: constant 0x0f40000000000 is so big it is long
linux/arch/x86_64/kernel/vsyscall.c:80:14: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:80:14:    expected void const volatile [noderef] *addr<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:80:14:    got void *<noident>
linux/arch/x86_64/kernel/vsyscall.c:200:7: warning: incorrect type in assignment (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:200:7:    expected unsigned short [usertype] *map1
linux/arch/x86_64/kernel/vsyscall.c:200:7:    got void [noderef] *<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:203:7: warning: incorrect type in assignment (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:203:7:    expected unsigned short [usertype] *map2
linux/arch/x86_64/kernel/vsyscall.c:203:7:    got void [noderef] *<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:215:10: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:215:10:    expected void volatile [noderef] *addr<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:215:10:    got unsigned short [usertype] *map2
linux/arch/x86_64/kernel/vsyscall.c:217:10: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:217:10:    expected void volatile [noderef] *addr<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:217:10:    got unsigned short [usertype] *map1

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
3bd4d18cba [PATCH] Move e820 map into e820.c
Minor cleanup. Keep setup.c free from unrelated clutter.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
c31fbb1ad8 [PATCH] Clean up acpi_numa variable
Move it into srat.c No need to clutter up setup.c for it

And remove use in setup.c completely - it only guarded a printk
which can be done unconditionally.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
df3bb57d2c [PATCH] i386/x86-64: Move acpi_disabled variables into acpi/boot.c
Removes code duplication between i386/x86-64.

Not needed anymore in setup.c since early_param cleanup

Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
43c85c9c5d [PATCH] Remove need for early lockdep init
I think it was only needed for the printks and we can do them later.

I put in a single early_printk so that we know the kernel is alive
(early_printk doesn't need any locks)

This makes some things easier for initialization of unwind for
lockdep, which is needed by later patches.

cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Andi Kleen
2c8c0e6b8d [PATCH] Convert x86-64 to early param
Instead of hackish manual parsing

Requires earlier i386 patchkit, but also fixes i386 early_printk again.

I removed some obsolete really early parameters which didn't do anything useful.
Also made a few parameters that needed it early (mostly oops printing setup)

Also removed one panic check that wasn't visible without
early console anyways (the early console is now initialized after that
panic)

This cleans up a lot of code.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Andi Kleen
9ca33eb698 [PATCH] Use early CPU identify before early command line parsing
This makes it possible to modify CPU flags in command line
options without hacks.

And remove another copy in head64.c

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Chuck Ebbert
d4d35854a1 [PATCH] remove lock prefix from is_at_popf() tests
The lock prefix will cause an exception when used with the
popf instruction, so no need to continue searching after it's
found.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Muli Ben-Yehuda
145106e810 [PATCH] remove superflous BUG_ON's in nommu and gart
There's no need to check for invalid DMA data direction in nommu and
gart since we do it in dma-mapping.h anyway before calling the
individual dma-ops.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Eric W. Biederman
29a6c25bd6 [PATCH] Fix gdt table size in trampoline.S
Allows easier extension of the GDT by using the proper C symbol
for the size in the descriptor.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Chuck Ebbert
a752d7194c [PATCH] fix is_at_popf() for compat tasks
When testing for the REX instruction prefix, first check
for 32-bit mode because in compat mode the REX prefix is an
increment instruction.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Muli Ben-Yehuda
0577f148b5 [PATCH] Calgary IOMMU: save a bit of space in bus_info
Make translation_disabled a uchar rather than an int

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00
Muli Ben-Yehuda
a4fc520a0f [PATCH] Calgary IOMMU: calgary_init_one_nontraslated() can return void
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00
Muli Ben-Yehuda
871b17008e [PATCH] Calgary IOMMU: fix reference counting of Calgary PCI devices
The pci_get_device() API decrements the reference count on the 'from'
parameter when it continues searching. Therefore, take a ref count on
Calgary bus when we initialize them in either translated or
non-translated mode.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00
Muli Ben-Yehuda
b8f4fe66a5 [PATCH] Calgary IOMMU: fix error path memleak in calgary_free_tar
We were freeing the iommu_table and leaking the bitmap pages. Also
rename it to calgary_free_bus, which is more accurate.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00
Muli Ben-Yehuda
9f2dc46d5e [PATCH] Calgary IOMMU: break out of pci_find_device_reverse if dev not found
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00
Muli Ben-Yehuda
f38db651d5 [PATCH] Calgary IOMMU: consolidate per bus data structures
Move the tce_table_kva array, disabled bitmap and bus_to_phb array
into a new per bus 'struct calgary_bus_info'. Also slightly reorganize
build_tce_table and tce_table_setparms to avoid exporting bus_info to
tce.c.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00
Jan Beulich
3b94355c47 [PATCH] remove int_delivery_dest
The genapic field and the accessor macro weren't used anywhere.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00
Jan Beulich
caff0710eb [PATCH] initialize end of memory variables as early as possible
While an earlier patch already did a small step into that direction,
this patch moves initialization of all memory end variables to as
early as possible, so that dependent code doesn't need to check
whether these variables have already been set.

Also, remove a misleading (perhaps just outdated) comment, and make
static a variable only used in a single file.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00
Andi Kleen
44cc45267b [PATCH] Remove obsolete CVS $Id$ from assembler files in arch/x86_64/kernel/*
CVS hasn't been used for a long time for them.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00
Andi Kleen
fe7414a288 [PATCH] Use BUILD_BUG_ON in apic.c build sanity checking
Makes code a little shorter.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:30 +02:00
Andi Kleen
efec3b9a32 [PATCH] Fix up some non linuxy style in ACPI functions in mpparse.c
No functional changes.

Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:30 +02:00
Andi Kleen
276ec1a76a [PATCH] Remove some unneeded ACPI externs in mpparse.c
They are not used in this file so remove them. i386 didn't have them either.

Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:30 +02:00
Andi Kleen
a01fd3baff [PATCH] Remove useless wrapper in mpparse.c code
It used to contain support code for NUMAQ, but that is long gone already
on 64bit.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:30 +02:00
Andi Kleen
55f05ffaa7 [PATCH] Replace mp bus array with bitmap for bus not pci
Since we only support PCI and ISA legacy busses now there is no need to
have an full array with checking.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:30 +02:00
Andi Kleen
dfa4698c50 [PATCH] Move early chipset quirks out to new file
They did not really belong into io_apic.c. Move them into a new file
and clean it up a bit.

Also remove outdated ATI quirk that was obsolete,

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:30 +02:00
Andi Kleen
edd9652296 [PATCH] Remove MPS table APIC renumbering
The MPS table specification says that the operating system should
renumber the IO-APICs following the table as needed.  However in
ACPI this is not allowed or neeeded and all x86-64 systems are ACPI
compliant.

The code was already disabled on some systems because it caused
problems there. Remove it completely now.

CC: mdomsch@dell.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:30 +02:00
Andi Kleen
eea0e11c1f [PATCH] Factor out common io apic routing entry access
The IO APIC code had lots of duplicated code to read/write 64bit
routing entries into the IO-APIC. Factor this out int common read/write
functions

In a few cases the IO APIC lock is taken more often now, but this
isn't a problem because it's all initialization/shutdown only
slow path code.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:30 +02:00
Andi Kleen
c1a58b42b4 [PATCH] i386/x86-64: Remove obsolete sanity check in mptable parsing
It apparently has never triggered in many years.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:30 +02:00
Andi Kleen
a8fcf1a24a [PATCH] Remove obsolete PIC mode
PIC mode is an outdated way to drive the APICs that was used on
some early MP boards. It is not supported in the ACPI model.

It is unlikely to be ever configured by any x86-64 system

Remove it thus.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:30 +02:00
Andi Kleen
e509913434 [PATCH] Remove leftover MCE/EISA support
No 64bit EISA or Microchannel systems ever. Remove the left over code
in the IO-APIC driver and the mptable parser

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:29 +02:00
Andi Kleen
5cb6b99928 [PATCH] Remove pirq overwrite support
This was an old workaround for broken MP-BIOS. The user could
specify overwrites on the command line.

I've never seen it being used for anything on 64bit. So get
rid of it for now.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:29 +02:00
Andi Kleen
2e91a17b35 [PATCH] Add some comments to entry.S
And remove some old obsolete ones.
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:29 +02:00
Andi Kleen
3f14c746a6 [PATCH] Remove old "focus disabled" chipset errata workaround
The new systems already use focus disabled and the comment was
completely outdated.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:29 +02:00
Andi Kleen
6c96a29f20 [PATCH] Remove apic mismatch counter
Nobody has been setting the mismatch counter and the ifdef was never
set so remove it.
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:29 +02:00
Andi Kleen
7f11d8a5ef [PATCH] Remove all ifdefs for local/io apic
IO-APIC or local APIC can only be disabled at runtime anyways and
Kconfig has forced these options on for a long time now.

The Kconfigs are kept only now for the benefit of the shared acpi
boot.c code.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:29 +02:00
Andi Kleen
5ba5891d44 [PATCH] Add some comments what tce.c actually does
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:29 +02:00
Andi Kleen
b06babac45 [PATCH] Add proper alignment to ENTRY
Previously it didn't align. Use the same one as the C compiler
in blended mode, which is good for K8 and Core2 and doesn't hurt
on P4.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:29 +02:00
Andi Kleen
31679f38d8 [PATCH] Simplify profile_pc on x86-64
Use knowledge about EFLAGS layout (bits 22:63 are always 0) to distingush
EFLAGS word and kernel address in the spin lock stack frame.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:28 +02:00
Andi Kleen
c16b63e09d [PATCH] i386/x86-64: Don't randomize stack top when no randomization personality is set
Based on patch from Frank van Maarseveen <frankvm@frankvm.com>, but
extended.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:28 +02:00
Adam Henley
d5d9ca6d88 [PATCH] A few trivial spelling and grammar fixes
A few trivial spelling and grammar mistakes picked up in
"arch/x86_64/aperture.c", "arch/x86_64/crash.c" and
"arch/x86_64/apic.c". I think all are correct fixes but am ever aware
of my fallibility :o) This is my first patch submission so all
feedback is appreciated, esp. WRT CCing to Linus, Andi and
trivial@kernel.org, is this correct? And which is the most appropriate
kernel version to diff against? If any.

Should apply cleanly to 2.6.18-rc1

Signed-off-by: Adam Henley <adamazing@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>

-  adam
2006-09-26 10:52:28 +02:00
Stephane Eranian
d3a4f48d48 [PATCH] x86-64 TIF flags for debug regs and io bitmap in ctxsw
Hello,

Following my discussion with Andi. Here is a patch that introduces
two new TIF flags to simplify the context switch code in __switch_to().
The idea is to minimize the number of cache lines accessed in the common
case, i.e., when neither the debug registers nor the I/O bitmap are used.

This patch covers the x86-64 modifications. A patch for i386 follows.

Changelog:
	- add TIF_DEBUG to track when debug registers are active
	- add TIF_IO_BITMAP to track when I/O bitmap is used
	- modify __switch_to() to use the new TIF flags

<signed-off-by>: eranian@hpl.hp.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:28 +02:00
Andi Kleen
2f766d1606 [PATCH] Clean up asm/smp.h includes
No need to include it from entry.S
Drop all the #ifdef __ASSEMBLY__

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:28 +02:00
Vojtech Pavlik
c08c820508 [PATCH] Add the vgetcpu vsyscall
This patch adds a vgetcpu vsyscall, which depending on the CPU RDTSCP
capability uses either the RDTSCP or CPUID to obtain a CPU and node
numbers and pass them to the program.

AK: Lots of changes over Vojtech's original code:
Better prototype for vgetcpu()
It's better to pass the cpu / node numbers as separate arguments
to avoid mistakes when going from SMP to NUMA.
Also add a fast time stamp based cache using a user supplied
argument to speed things more up.
Use fast method from Chuck Ebbert to retrieve node/cpu from
GDT limit instead of CPUID
Made sure RDTSCP init is always executed after node is known.
Drop printk

Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:28 +02:00
Vojtech Pavlik
a670fad0ad [PATCH] Add initalization of the RDTSCP auxilliary values
This patch adds initalization of the RDTSCP auxilliary values to CPU numbers
to time.c. If RDTSCP is available, the MSRs are written with the respective
values. It can be later used to initalize per-cpu timekeeping variables.

AK: Some cleanups. Move externs into headers and fix CPU hotplug.

Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:28 +02:00
Venkatesh Pallipadi
248dcb2fff [PATCH] x86: i386/x86-64 Add nmi watchdog support for new Intel CPUs
AK: This redoes the changes I temporarily reverted.

Intel now has support for Architectural Performance Monitoring Counters
( Refer to IA-32 Intel Architecture Software Developer's Manual
http://www.intel.com/design/pentium4/manuals/253669.htm ). This
feature is present starting from Intel Core Duo and Intel Core Solo processors.

What this means is, the performance monitoring counters and some performance
monitoring events are now defined in an architectural way (using cpuid).
And there will be no need to check for family/model etc for these architectural
events.

Below is the patch to use this performance counters in nmi watchdog driver.
Patch handles both i386 and x86-64 kernels.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:27 +02:00
Vivek Goyal
3f22c5789e [PATCH] kdump x86_64 nmi event notification fix
After a crash we should wait for NMI IPI event and not for external NMI or
NMI watchdog tick.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:27 +02:00
Andi Kleen
fac58550e8 [PATCH] Fix up panic messages for different NMI panics
When a unknown NMI happened the panic would claim a NMI watchdog timeout.
Also it would check the variable set by nmi_watchdog=panic and panic then.

Fix up the panic message to be generic
Unconditionally panic on unknown NMI when panic on unknown nmi is enabled.

Noticed by Jan Beulich

Cc: jbeulich@novell.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:27 +02:00
Shaohua Li
4038f901cf [PATCH] i386/x86-64: Fix NMI watchdog suspend/resume
Making NMI suspend/resume work with SMP. We use CPU hotplug to offline
APs in SMP suspend/resume. Only BSP executes sysdev's .suspend/.resume
method. APs should follow CPU hotplug code path.

And:

+From: Don Zickus <dzickus@redhat.com>

Makes the start/stop paths of nmi watchdog more robust to handle the
suspend/resume cases more gracefully.

AK: I merged the two patches together

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:27 +02:00
Don Zickus
c41c5cd3b2 [PATCH] x86: x86 clean up nmi panic messages
Clean up some of the output messages on the nmi error paths to make more
sense when they are displayed.  This is mainly a cosmetic fix and
shouldn't impact any normal code path.

Signed-off-by:  Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:27 +02:00
Don Zickus
8da5adda91 [PATCH] x86: Allow users to force a panic on NMI
To quote Alan Cox:

The default Linux behaviour on an NMI of either memory or unknown is to
continue operation. For many environments such as scientific computing
it is preferable that the box is taken out and the error dealt with than
an uncorrected parity/ECC error get propogated.

A small number of systems do generate NMI's for bizarre random reasons
such as power management so the default is unchanged. In other respects
the new proc/sys entry works like the existing panic controls already in
that directory.

This is separate to the edac support - EDAC allows supported chipsets to
handle ECC errors well, this change allows unsupported cases to at least
panic rather than cause problems further down the line.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:27 +02:00
Don Zickus
e33e89ab1a [PATCH] x86: Add abilty to enable/disable nmi watchdog from procfs (update)
Adds a new /proc/sys/kernel/nmi_watchdog call that will enable/disable the
nmi watchdog.

By entering a non-zero value here, a user can enable the nmi watchdog to
monitor the online cpus in the system.  By entering a zero value here, a
user can disable the nmi watchdog and free up a performance counter which
could then be utilized by the oprofile subsystem, otherwise oprofile may be
short a counter when in use.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:27 +02:00
Don Zickus
407984f1af [PATCH] x86: Add abilty to enable/disable nmi watchdog with sysctl
Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi
watchdog.

Signed-off-by:  Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:27 +02:00
Don Zickus
2fbe7b25c8 [PATCH] i386/x86-64: Remove un/set_nmi_callback and reserve/release_lapic_nmi functions
Removes the un/set_nmi_callback and reserve/release_lapic_nmi functions as
they are no longer needed.  The various subsystems are modified to register
with the die_notifier instead.

Also includes compile fixes by Andrew Morton.

Signed-off-by:  Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:27 +02:00
Andi Kleen
1d001df19d [PATCH] Add TIF_RESTORE_SIGMASK
We need TIF_RESTORE_SIGMASK in order to support ppoll() and pselect()
system calls. This patch originally came from Andi, and was based
heavily on David Howells' implementation of same on i386. I fixed a typo
which was causing do_signal() to use the wrong signal mask.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:26 +02:00
Don Zickus
3adbbcce9a [PATCH] x86: Cleanup NMI interrupt path
This patch cleans up the NMI interrupt path.  Instead of being gated by if
the 'nmi callback' is set, the interrupt handler now calls everyone who is
registered on the die_chain and additionally checks the nmi watchdog,
reseting it if enabled.  This allows more subsystems to hook into the NMI if
they need to (without being block by set_nmi_callback).

Signed-off-by:  Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:26 +02:00
Don Zickus
f2802e7f57 [PATCH] Add SMP support on x86_64 to reservation framework
This patch includes the changes to make the nmi watchdog on x86_64 SMP
aware.  A bunch of code was moved around to make it simpler to read.  In
addition, it is now possible to determine if a particular NMI was the result
of the watchdog or not.  This feature allows the kernel to filter out
unknown NMIs easier.

Signed-off-by:  Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:26 +02:00
Don Zickus
828f0afda1 [PATCH] x86: Add performance counter reservation framework for UP kernels
Adds basic infrastructure to allow subsystems to reserve performance
counters on the x86 chips.  Only UP kernels are supported in this patch to
make reviewing easier.  The SMP portion makes a lot more changes.

Think of this as a locking mechanism where each bit represents a different
counter.  In addition, each subsystem should also reserve an appropriate
event selection register that will correspond to the performance counter it
will be using (this is mainly neccessary for the Pentium 4 chips as they
break the 1:1 relationship to performance counters).

This will help prevent subsystems like oprofile from interfering with the
nmi watchdog.

Signed-off-by:  Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:26 +02:00
Andi Kleen
b07f8915cd [PATCH] x86: Temporarily revert parts of the Core 2 nmi nmi watchdog support
This makes merging easier.  They are readded a few patches later.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:26 +02:00
Linus Torvalds
79e453d49b Revert mmiocfg heuristics and blacklist changes
This reverts commits 11012d419c and
40dd2d20f2, which allowed us to use the
MMIO accesses for PCI config cycles even without the area being marked
reserved in the e820 memory tables.

Those changes were needed for EFI-environment Intel macs, but broke some
newer Intel 965 boards, so for now it's better to revert to our old
2.6.17 behaviour and at least avoid introducing any new breakage.

Andi Kleen has a set of patches that work with both EFI and the broken
Intel 965 boards, which will be applied once they get wider testing.

Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Edgar Hucek <hostmaster@ed-soft.at>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-19 08:15:22 -07:00
Al Viro
55669bfa14 [PATCH] audit: AUDIT_PERM support
add support for AUDIT_PERM predicate

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-09-11 13:32:30 -04:00
Al Viro
dc104fb323 [PATCH] audit: more syscall classes added
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-09-11 13:32:27 -04:00
Keith Owens
01ebb77b31 [PATCH] x86_64: Save original IST values for checking stack addresses
The values in init_tss.ist[] can change when an IST event occurs.  Save
the original IST values for checking stack addresses when debugging or
doing stack traces.

Signed-off-by: Keith Owens <kaos@ocs.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-30 16:05:16 -07:00
Andi Kleen
ceee882230 [PATCH] x86_64: Recover 1MB of kernel memory
Noticed by Jan Beulich.

When the kernel was moved from 1MB to 2MB in 2.6.17 the kernel reservation
code wasn't adjusted and it still reserved starting with 1MB. This means 1MB always
were lost.

This patch fixes this by reserving only starting with _text.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-30 16:05:15 -07:00
Jan Beulich
ea424055b7 [PATCH] x86: Make backtracer fallback logic more bullet-proof
The unwinder fallback logic still had potential for falling through to
the legacy stack trace code without printing an indication (at once
serving as a separator) of this.

Further, the stack pointer retrieval for the fallback should be as
restrictive as possible (in order to avoid having the legacy stack
tracer try to access invalid memory). The patch tightens that, but
this could certainly be further improved.

Also making the call_trace command line option now conditional upon
CONFIG_STACK_UNWIND (as it's meaningless otherwise).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-30 16:05:15 -07:00
Andi Kleen
c05991ed12 [PATCH] x86_64: Add kernel thread stack frame termination for properly stopping stack unwinds.
One open question: Should these added pushes perhaps be made
conditional upon CONFIG_STACK_UNWIND or CONFIG_UNWIND_INFO?
[AK: Not needed -- these are all very slow paths]

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-30 16:05:15 -07:00
Andi Kleen
11012d419c [PATCH] x86: Revert e820 MCFG heuristics
The check for the MCFG table being reserved in the e820 map was originally
added to detect a broken BIOS in a preproduction Intel SDV. However it also
breaks the Apple x86 Macs, which can't supply this properly, but need
a working MCFG. With this patch they wouldn't use the MCFG and not work.

After some discussion I think it's best to remove the heuristic again.
It also failed on some other boxes (although it didn't cause much
problems there because old style port access for PCI config space
still works as fallback), but the preproduction SDVs can just use
pci=nommcfg. Supporting production machines properly is more
important.

Edgar Hucek did all the debugging work.

Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Edgar Hucek <hostmaster@ed-soft.at>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-30 16:05:15 -07:00
Horms
012c437d03 [PATCH] Change panic_on_oops message to "Fatal exception"
Previously the message was "Fatal exception: panic_on_oops", as introduced
in a recent patch whith removed a somewhat dangerous call to ssleep() in
the panic_on_oops path.  However, Paul Mackerras suggested that this was
somewhat confusing, leadind people to believe that it was panic_on_oops
that was the root cause of the fatal exception.  On his suggestion, this
patch changes the message to simply "Fatal exception".  A suitable oops
message should already have been displayed.

Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-14 12:54:29 -07:00
Alexey Dobriyan
825e037fb8 [PATCH] Fix more per-cpu typos
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-06 08:57:47 -07:00
Muli Ben-Yehuda
a166222cde [PATCH] x86_64: Fix CONFIG_IOMMU_DEBUG
If CONFIG_IOMMU_DEBUG is set force_iommu defaults to 1. In the case
where no HW IOMMU is present in the machine and we end up using nommu,
leaving force_iommu set to 1 causes dma_alloc_coherent to do the wrong
thing. Therefore, if we end up using nommu, make sure force_iommu is
0.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-02 20:19:54 -07:00
Andi Kleen
2699500b31 [PATCH] x86_64: Fix backtracing for interrupt stacks
Re-add backlink for old style unwinder to stack switching.  Add proper
stack frame and CFI annotations to call_softirq

This prevents a oops when backtracing with fallback through the
interrupt stack top.

Suggested by Jan Beulich and Herbert Xu wanted it in 2.6.18.

Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-02 20:19:54 -07:00
Chandra Seetharaman
be6b5a3505 [PATCH] cpu hotplug: use hotplug version of registration in late inits
Use hotplug version of register_cpu_notifier in late init functions.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-31 13:28:39 -07:00
Horms
cea6a4ba8a [PATCH] panic_on_oops: remove ssleep()
This patch is part of an effort to unify the panic_on_oops behaviour across
all architectures that implement it.

It was pointed out to me by Andi Kleen that if an oops has occured in
interrupt context, then calling sleep() in the oops path will only cause a
panic, and that it would be really better for it not to be in the path at
all.

This patch removes the ssleep() call and reworks the console message
accordinly.  I have a slght concern that the resulting console message is
too long, feedback welcome.

For powerpc it also unifies the 32bit and 64bit behaviour.

Fror x86_64, this patch only updates the console message, as ssleep() is
already not present.

Signed-off-by: Horms <horms@verge.net.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-31 13:28:39 -07:00
Eric W. Biederman
2a8a3d5b65 [PATCH] machine_kexec.c: Fix the description of segment handling
One of my original comments in machine_kexec was unclear
and this should fix it.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Horms <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-31 13:28:38 -07:00
Andi Kleen
65f87d8a8a [PATCH] x86_64: Fix swiotlb=force
It was broken before. But having it is important as possible hardware
bug workaround.

And previously there was no way to force swiotlb if there is another IOMMU.
Side effect is that iommu=force won't force swiotlb anymore even if there
isn't another IOMMU.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-29 20:59:55 -07:00
Jon Mason
d2105b10fe [PATCH] x86_64: Calgary IOMMU - Multi-Node NULL pointer dereference fix
Calgary hits a NULL pointer dereference when booting in a multi-chassis
NUMA system.  See Redhat bugzilla number 198498, found by Konrad
Rzeszutek (konradr@redhat.com).

There are many issues that had to be resolved to fix this problem.
Firstly when I originally wrote the code to handle NUMA systems, I
had a large misunderstanding that was not corrected until now.  That was
that I thought the "number of nodes online" referred to number of
physical systems connected.  So that if NUMA was disabled, there
would only be 1 node and it would only show that node's PCI bus.
In reality if NUMA is disabled, the system displays all of the
connected chassis as one node but is only ignorant of the delays
in accessing main memory.  Therefore, references to num_online_nodes()
and MAX_NUMNODES are incorrect and need to be set to the maximum
number of nodes that can be accessed (which are 8).  I created a
variable, MAX_NUM_CHASSIS, and set it to 8 to fix this.

Secondly, when walking the PCI in detect_calgary, the code only
checked the first "slot" when looking to see if a device is present.
This will work for most cases, but unfortunately it isn't always the
case.  In the NUMA MXE drawers, there are USB devices present on the
3rd slot (with slot 1 being empty).  So, to work around this, all
slots (up to 8) are scanned to see if there are any devices present.

Lastly, the bus is being enumerated on large systems in a different
way the we originally thought.  This throws the ugly logic we had
out the window.  To more elegantly handle this, I reorganized the
kva array to be sparse (which removed the need to have any bus number
to kva slot logic in tce.c) and created a secondary space array to
contain the bus number to phb mapping.

With these changes Calgary boots on an x460 with 4 nodes with and
without NUMA enabled.

Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-29 20:59:55 -07:00
Muli Ben-Yehuda
089bbbcb36 [PATCH] x86_64: Calgary IOMMU - fix off by one error
Fixed off-by-one error in detect_calgary and calgary_init which will
cause arrays to overflow.  Also, removed impossible to hit BUG_ON.

Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-29 20:59:55 -07:00
Andi Kleen
0e5f61b00c [PATCH] x86_64: On Intel systems when CPU has C3 don't use TSC
On Intel systems generally the TSC stops in C3 or deeper,
so don't use it there. Follows similar logic on i386.

This should fix problems on Meroms.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-29 20:59:55 -07:00
Andi Kleen
b13761ecd1 [PATCH] x86_64: Dump leftover backtrace entries when dwarf2 unwinder got stuck
The dwarf2 unwinder currently often gets stuck because a lot
of assembly code doesn't have proper dwarf2 annotiation yet.

This currently often happens with __down. Should fix this by
adding proper dwarf2 annotation to all inline assembly. However
until that's done we need a quick fix for 2.6.18 to avoid
incomplete backtraces.

So when this happens dump the rest of the stack with the old unwinder
instead of silently not dumping it. There was already a optional
"both" mode that dumped both, but that was too ugly.

I also clarified the headers for the different backtraces a bit.

Also add a clear error message for missing dwarf2
annotation that people can work on.

And I removed a dead variable left over from Ingo's changes.

Cc: mingo@elte.hu
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-28 19:28:00 -07:00
Andi Kleen
d5a2601734 [PATCH] i386/x86-64: Add user_mode checks to profile_pc for oprofile
Fixes a obscure user space triggerable crash during oprofiling.

Oprofile calls profile_pc from NMIs even when user_mode(regs) is not true and
the program counter is inside the kernel lock section. This opens
a race - when a user program jumps to a kernel lock address and
a NMI happens before the illegal page fault exception is raised
and the program has a unmapped esp or ebp then the kernel could
oops. NMIs have a higher priority than exceptions so that could
happen.

Add user_mode checks to i386/x86-64 profile_pc to prevent that.

Cc: John Levon <levon@movementarian.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-28 19:28:00 -07:00
Muli Ben-Yehuda
aa0a9f373e [PATCH] x86_64: Fix Calgary copyright statements per IBM guidelines
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 15:12:33 -07:00
Jacob Shin
0d2caebd56 [PATCH] x86_64: Fix hotplug problem in mce amd
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 15:12:32 -07:00
Jon Smirl
894673ee61 [PATCH] tty: Remove include of screen_info.h from tty.h
screen_info.h doesn't have anything to do with the tty layer and shouldn't be
included by tty.h.  This patches removes the include and modifies all users to
directly include screen_info.h.  struct screen_info is mainly used to
communicate with the console drivers in drivers/video/console.  Note that this
patch touches every arch and I have no way of testing it.  If there is a
mistake the worst thing that will happen is a compile error.

[akpm@osdl.org: fix arm build]
[akpm@osdl.org: fix alpha build]
Signed-off-by: Jon Smirl <jonsmir@gmail.com>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:16 -07:00
Arjan van de Ven
1454aed92b [PATCH] put a comment at register_die_notifier that the export is used
{un}register_die_notifier() is used by kdb... document this so that future
"remove dead export" rounds can skip this export.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:15 -07:00
Ingo Molnar
f86bf9b7bc [PATCH] lockdep: clean up completion initializer in smpboot.c
Clean up lockdep on-stack-completion initializer.  (This also removes the
dependency on waitqueue_lock_key.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:14 -07:00
Andrew Morton
1a91023a9f [PATCH] x86_64: e820.c needs pgtable.h
arch/x86_64/kernel/e820.c:42: error: 'MAXMEM' undeclared here (not in a function)

Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:13 -07:00
Ingo Molnar
60be6b9a41 [PATCH] lockdep: annotate on-stack completions
lockdep needs to have the waitqueue lock initialized for on-stack waitqueues
implicitly initialized by DECLARE_COMPLETION().  Annotate on-stack completions
accordingly.

Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:09 -07:00
Ingo Molnar
366c7f554e [PATCH] lockdep: annotate enable_in_hardirq()
Make use of local_irq_enable_in_hardirq() API to annotate places that enable
hardirqs in hardirq context.

Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:09 -07:00
Ingo Molnar
2148270cd2 [PATCH] lockdep: x86_64 early init
x86_64 uses spinlocks very early - earlier than start_kernel().  So call
lockdep_init() from the arch setup code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:04 -07:00
Ingo Molnar
2601e64d26 [PATCH] lockdep: irqtrace subsystem, x86_64 support
Add irqflags-tracing support to x86_64.

[akpm@osdl.org: build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:03 -07:00
Ingo Molnar
21b32bbff9 [PATCH] lockdep: stacktrace subsystem, x86_64 support
Framework to generate and save stacktraces quickly, without printing anything
to the console.  x86_64 support.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:02 -07:00
Ingo Molnar
c9ca1ba5bd [PATCH] lockdep: x86_64 document stack frame internals
Document stack frame nesting internals some more.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:02 -07:00
Ingo Molnar
3ac94932a2 [PATCH] lockdep: beautify x86_64 stacktraces
Beautify x86_64 stacktraces to be more readable.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:02 -07:00
Thomas Gleixner
b1e05aa230 [PATCH] irq-flags: x86_64: Use the new IRQF_ constants
Use the new IRQF_ constants and remove the SA_INTERRUPT define

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-02 13:58:49 -07:00
Al Viro
b915543b46 [PATCH] audit syscall classes
Allow to tie upper bits of syscall bitmap in audit rules to kernel-defined
sets of syscalls.  Infrastructure, a couple of classes (with 32bit counterparts
for biarch targets) and actual tie-in on i386, amd64 and ia64.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-07-01 07:44:10 -04:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Adrian Bunk
5bba17127e [NET]: make skb_release_data() static
skb_release_data() no longer has any users in other files.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:58:30 -07:00
Ingo Molnar
c0ad90a32f [PATCH] genirq: add ->retrigger() irq op to consolidate hw_irq_resend()
Add ->retrigger() irq op to consolidate hw_irq_resend() implementations.
(Most architectures had it defined to NOP anyway.)

NOTE: ia64 needs testing. i386 and x86_64 tested.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29 10:26:23 -07:00
Ingo Molnar
a53da52fd7 [PATCH] genirq: cleanup: merge irq_affinity[] into irq_desc[]
Consolidation: remove the irq_affinity[NR_IRQS] array and move it into the
irq_desc[NR_IRQS].affinity field.

[akpm@osdl.org: sparc64 build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29 10:26:22 -07:00
Ingo Molnar
d1bef4ed5f [PATCH] genirq: rename desc->handler to desc->chip
This patch-queue improves the generic IRQ layer to be truly generic, by adding
various abstractions and features to it, without impacting existing
functionality.

While the queue can be best described as "fix and improve everything in the
generic IRQ layer that we could think of", and thus it consists of many
smaller features and lots of cleanups, the one feature that stands out most is
the new 'irq chip' abstraction.

The irq-chip abstraction is about describing and coding and IRQ controller
driver by mapping its raw hardware capabilities [and quirks, if needed] in a
straightforward way, without having to think about "IRQ flow"
(level/edge/etc.) type of details.

This stands in contrast with the current 'irq-type' model of genirq
architectures, which 'mixes' raw hardware capabilities with 'flow' details.
The patchset supports both types of irq controller designs at once, and
converts i386 and x86_64 to the new irq-chip design.

As a bonus side-effect of the irq-chip approach, chained interrupt controllers
(master/slave PIC constructs, etc.) are now supported by design as well.

The end result of this patchset intends to be simpler architecture-level code
and more consolidation between architectures.

We reused many bits of code and many concepts from Russell King's ARM IRQ
layer, the merging of which was one of the motivations for this patchset.

This patch:

rename desc->handler to desc->chip.

Originally i did not want to do this, because it's a big patch.  But having
both "desc->handler", "desc->handle_irq" and "action->handler" caused a
large degree of confusion and made the code appear alot less clean than it
truly is.

I have also attempted a dual approach as well by introducing a
desc->chip alias - but that just wasnt robust enough and broke
frequently.

So lets get over with this quickly.  The conversion was done automatically
via scripts and converts all the code in the kernel.

This renaming patch is the first one amongst the patches, so that the
remaining patches can stay flexible and can be merged and split up
without having some big monolithic patch act as a merge barrier.

[akpm@osdl.org: build fix]
[akpm@osdl.org: another build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29 10:26:21 -07:00
Andrew Morton
d9a5685436 [PATCH] x86_64: oprofile build fix
WARNING: "unset_nmi_callback" [arch/x86_64/oprofile/oprofile.ko] undefined!
WARNING: "set_nmi_callback" [arch/x86_64/oprofile/oprofile.ko] undefined!

Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28 14:59:07 -07:00
Andrew Morton
a052b68b1e [PATCH] x86: do_IRQ(): check irq number
We recently changed x86 to handle more than 256 IRQs.  Add a check in do_IRQ()
just to make sure that nothing went wrong with that implementation.

[chrisw@sous-sol.org: do x86_64 too]
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@muc.de>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28 14:59:02 -07:00
Siddha, Suresh B
5c45bf279d [PATCH] sched: mc/smt power savings sched policy
sysfs entries 'sched_mc_power_savings' and 'sched_smt_power_savings' in
/sys/devices/system/cpu/ control the MC/SMT power savings policy for the
scheduler.

Based on the values (1-enable, 0-disable) for these controls, sched groups
cpu power will be determined for different domains.  When power savings
policy is enabled and under light load conditions, scheduler will minimize
the physical packages/cpu cores carrying the load and thus conserving
power(with a perf impact based on the workload characteristics...  see OLS
2005 CMP kernel scheduler paper for more details..)

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Con Kolivas <kernel@kolivas.org>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:45 -07:00
Chandra Seetharaman
74b85f3790 [PATCH] cpu hotplug: make cpu_notifier related notifier blocks __cpuinit only
Make notifier_blocks associated with cpu_notifier as __cpuinitdata.

__cpuinitdata makes sure that the data is init time only unless
CONFIG_HOTPLUG_CPU is defined.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:41 -07:00
Chandra Seetharaman
9c7b216d23 [PATCH] cpu hotplug: revert init patch submitted for 2.6.17
In 2.6.17, there was a problem with cpu_notifiers and XFS.  I provided a
band-aid solution to solve that problem.  In the process, i undid all the
changes you both were making to ensure that these notifiers were available
only at init time (unless CONFIG_HOTPLUG_CPU is defined).

We deferred the real fix to 2.6.18.  Here is a set of patches that fixes the
XFS problem cleanly and makes the cpu notifiers available only at init time
(unless CONFIG_HOTPLUG_CPU is defined).

If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run
time.

This patch reverts the notifier_call changes made in 2.6.17

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:40 -07:00
Rusty Russell
19eadf98c8 [PATCH] x86: increase interrupt vector range
Remove the limit of 256 interrupt vectors by changing the value stored in
orig_{e,r}ax to be the complemented interrupt vector.  The orig_{e,r}ax
needs to be < 0 to allow the signal code to distinguish between return from
interrupt and return from syscall.  With this change applied, NR_IRQS can
be > 256.

Xen extends the IRQ numbering space to include room for dynamically
allocated virtual interrupts (in the range 256-511), which requires a more
permissive interface to do_IRQ.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:37 -07:00
Linus Torvalds
da206c9e68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  typo fixes
  Clean up 'inline is not at beginning' warnings for usb storage
  Storage class should be first
  i386: Trivial typo fixes
  ixj: make ixj_set_tone_off() static
  spelling fixes
  fix paniced->panicked typos
  Spelling fixes for Documentation/atomic_ops.txt
  move acknowledgment for Mark Adler to CREDITS
  remove the bouncing email address of David Campbell
2006-06-26 13:33:14 -07:00
Linus Torvalds
972d19e837 Merge master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  [CRYPTO] tcrypt: Forbid tcrypt from being built-in
  [CRYPTO] aes: Add wrappers for assembly routines
  [CRYPTO] tcrypt: Speed benchmark support for digest algorithms
  [CRYPTO] tcrypt: Return -EAGAIN from module_init()
  [CRYPTO] api: Allow replacement when registering new algorithms
  [CRYPTO] api: Removed const from cra_name/cra_driver_name
  [CRYPTO] api: Added cra_init/cra_exit
  [CRYPTO] api: Fixed incorrect passing of context instead of tfm
  [CRYPTO] padlock: Rearrange context structure to reduce code size
  [CRYPTO] all: Pass tfm instead of ctx to algorithms
  [CRYPTO] digest: Remove unnecessary zeroing during init
  [CRYPTO] aes-i586: Get rid of useless function wrappers
  [CRYPTO] digest: Add alignment handling
  [CRYPTO] khazad: Use 32-bit reads on key
2006-06-26 11:03:29 -07:00
Linus Torvalds
81a07d7588 Merge branch 'x86-64'
* x86-64: (83 commits)
  [PATCH] x86_64: x86_64 stack usage debugging
  [PATCH] x86_64: (resend) x86_64 stack overflow debugging
  [PATCH] x86_64: msi_apic.c build fix
  [PATCH] x86_64: i386/x86-64 Add nmi watchdog support for new Intel CPUs
  [PATCH] x86_64: Avoid broadcasting NMI IPIs
  [PATCH] x86_64: fix apic error on bootup
  [PATCH] x86_64: enlarge window for stack growth
  [PATCH] x86_64: Minor string functions optimizations
  [PATCH] x86_64: Move export symbols to their C functions
  [PATCH] x86_64: Standardize i386/x86_64 handling of NMI_VECTOR
  [PATCH] x86_64: Fix modular pc speaker
  [PATCH] x86_64: remove sys32_ni_syscall()
  [PATCH] x86_64: Do not use -ffunction-sections for modules
  [PATCH] x86_64: Add cpu_relax to apic_wait_icr_idle
  [PATCH] x86_64: adjust kstack_depth_to_print default
  [PATCH] i386/x86-64: adjust /proc/interrupts column headings
  [PATCH] x86_64: Fix race in cpu_local_* on preemptible kernels
  [PATCH] x86_64: Fix fast check in safe_smp_processor_id
  [PATCH] x86_64: x86_64 setup.c - printing cmp related boottime information
  [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
  ...

Manual resolve of trivial conflict in arch/i386/kernel/Makefile
2006-06-26 10:51:09 -07:00
Eric Sandeen
4961f10e22 [PATCH] x86_64: (resend) x86_64 stack overflow debugging
Take two, now without spurious whitespace :(  Applies to git & 2.6.17-rc6

CONFIG_DEBUG_STACKOVERFLOW existed for x86_64 in 2.4, but seems to have gone AWOL in 2.6.

I've pretty much just copied this over from the 2.4 code, with
appropriate tweaks for the 2.6 kernel, plus a bugfix.  I'd personally
rather see it printed out the way other arches do it, i.e.
bytes-remaining-until-overflow, rather than having to do the subtraction
yourself.  Also, only 128 bytes remaining seems awfully late to issue a
warning.  But I'll start here :)

Signed-off-by: Eric Sandeen <sandeen@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:22 -07:00
Venkatesh Pallipadi
0080e66755 [PATCH] x86_64: i386/x86-64 Add nmi watchdog support for new Intel CPUs
Intel now has support for Architectural Performance Monitoring Counters
( Refer to IA-32 Intel Architecture Software Developer's Manual
http://www.intel.com/design/pentium4/manuals/253669.htm ). This
feature is present starting from Intel Core Duo and Intel Core Solo processors.

What this means is, the performance monitoring counters and some performance
monitoring events are now defined in an architectural way (using cpuid).
And there will be no need to check for family/model etc for these architectural
events.

Below is the patch to use this performance counters in nmi watchdog driver.
Patch handles both i386 and x86-64 kernels.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:22 -07:00
Keith Owens
e77deacb7b [PATCH] x86_64: Avoid broadcasting NMI IPIs
On some i386/x86_64 systems, sending an NMI IPI as a broadcast will
reset the system.  This seems to be a BIOS bug which affects machines
where one or more cpus are not under OS control.  It occurs on HT
systems with a version of the OS that is not compiled without HT
support.  It also occurs when a system is booted with max_cpus=n where
2 <= n < cpus known to the BIOS.  The fix is to always send NMI IPI as
a mask instead of as a broadcast.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:22 -07:00
Siddha, Suresh B
704fc59e1d [PATCH] x86_64: fix apic error on bootup
Appended patch fixes the "APIC error on CPUX: 00(40)" observed during bootup.

From SDM Vol-3A "Valid Interrupt Vectors" section:
	"When an illegal vector value (0-15) is written to an LVT entry
	and the delivery mode is Fixed, the APIC may signal an illegal
	vector error, with out regard to whether the mask bit is set
	or whether an interrupt is actually seen on input."

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:22 -07:00
Andi Kleen
2ee60e1789 [PATCH] x86_64: Move export symbols to their C functions
Only exports for assembler files are left in x8664_ksyms.c

Originally inspired by a patch from Al Viro
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:22 -07:00
Keith Owens
45486f81c9 [PATCH] x86_64: Standardize i386/x86_64 handling of NMI_VECTOR
x86_64 and i386 behave inconsistently when sending an IPI on vector 2
(NMI_VECTOR).  Make both behave the same, so IPI 2 is sent as NMI.

The crash code was abusing send_IPI_allbutself() by passing a code
instead of a vector, it only worked because crash knew about the
internal code of send_IPI_allbutself().  Change crash to use NMI_VECTOR
instead, and remove the comment about how crash was abusing the function.

This patch is a pre-requisite for fixing the problem where sending an
IPI as NMI would reboot some Dell Xeon systems.  I cannot fix that
problem while crash continus to abuse send_IPI_allbutself().

It also removes the inconsistency between i386 and x86_64 for
NMI_VECTOR.  That will simplify all the RAS code that needs to bring
all the cpus to a clean stop, even when one or more cpus are spinning
disabled.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:22 -07:00
Piotr Kaczuba
9c63f87387 [PATCH] x86_64: Fix modular pc speaker
It turned out that the following change is needed when the speaker is
compiled as a module.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:21 -07:00
Jan Beulich
cab093b9d4 [PATCH] x86_64: adjust kstack_depth_to_print default
Defaulting to a value not evenly divisible by four makes little sense,
as four values are displayed per line (and hence the rest of the line
would otherwise be wasted).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:21 -07:00
Jan Beulich
bdbdaa791f [PATCH] i386/x86-64: adjust /proc/interrupts column headings
With (significantly) more than 10 CPUs online, the column headings
drifted off the positions of the column contents with growing CPU
numbers.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:21 -07:00
Andi Kleen
75bd665cc9 [PATCH] x86_64: Fix fast check in safe_smp_processor_id
The APIC ID returned by hard_smp_processor_id can be beyond
NR_CPUS and then overflow the x86_cpu_to_apic[] array.

Add a check for overflow. If it happens then the slow loop below
will catch.

Bug pointed out by Doug Thompson

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:21 -07:00
Rohit Seth
e42f943737 [PATCH] x86_64: x86_64 setup.c - printing cmp related boottime information
Getting phys_proc_id and cpu_core_id information to be printed at boot
time for AMD processors.  Also matching the Node related boot time
information that gets printed for Intel and AMD processors for NUMA
configurations.

Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:21 -07:00
Andi Kleen
495ab9c045 [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
During some profiling I noticed that default_idle causes a lot of
memory traffic. I think that is caused by the atomic operations
to clear/set the polling flag in thread_info. There is actually
no reason to make this atomic - only the idle thread does it
to itself, other CPUs only read it. So I moved it into ti->status.

Converted i386/x86-64/ia64 for now because that was the easiest
way to fix ACPI which also manipulates these flags in its idle
function.

Cc: Nick Piggin <npiggin@novell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:21 -07:00
Andi Kleen
d9005b52de [PATCH] x86_64: Remove bogus RED-PEN comment in signal.c
No red zone possible/needed on the alternative stack.

It caused confusion.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:20 -07:00
Andrew Morton
8fa3d6fc5e [PATCH] x86_64: check_addr() cleanups
- Use DMA_32BIT_MASK

 - Use %z for size_t

 - 80-cols

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:20 -07:00
Andi Kleen
b633237e9c [PATCH] x86_64: Mark mce_amd cpu notifier __cpuinit/__cpuinitdata
Cc: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:20 -07:00
Jacob Shin
2903ee85ce [PATCH] x86_64: mce_amd cleanup
Clean up mce_amd.c for readability and remove code no
longer needed.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:20 -07:00
Jacob Shin
9526866439 [PATCH] x86_64: mce_amd support for family 0x10 processors
Add support for mce threshold registers found in future
AMD family 0x10 processors.  Backwards compatible with
family 0xF hardware.

AK: fixed build on !SMP

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:20 -07:00
Jacob Shin
fff2e89f11 [PATCH] x86_64: mce_amd relocate sysfs files
Get rid of /sys/devices/system/threshold directory and move
mce_amd thresholding files into the machine sysfs directory --
/sys/devices/system/machinecheck.

AK: Fixed warning

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:20 -07:00
Jacob Shin
17fc14ff1b [PATCH] x86_64: apic support for extended apic interrupt
Add support for extended APIC LVT found in future AMD processors.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:20 -07:00
Vojtech Pavlik
2f82bde472 [PATCH] x86_64: Update copyright in time.c
Update my copyright dates in arch/x86-64/kernel/time.c

Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:19 -07:00
Vojtech Pavlik
b2df3ddb68 [PATCH] x86_64: Explain why HPET T0_CMP register is written twice
After writing the CFG register, the first value written to the T0_CMP
register is the value at which next interrupt should be triggered, every
value after that sets the period of the interrupt. For that reason, the code
needs to write the value twice - to set both the phase and period.

[AK: I had already figured it out by myself, but it's still useful
to have a comment for this.]

Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:19 -07:00
Vojtech Pavlik
4221133845 [PATCH] x86_64: Make use of the *PER* constants in time.c
This patch makes use of the newly added conversion constants
in time.h to x86-64 time.c. The code gets significantly easier
to understand.

Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:19 -07:00
Vojtech Pavlik
e30db3e699 [PATCH] x86_64: Remove hack to manually enable HPET on AMD8111 southbridges
Remove #ifdefed code to manually enable HPET on AMD8111, where the
BIOS doesn't have ACPI HPET tables and doesn't enable it for us.

Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:19 -07:00
Vojtech Pavlik
7b0e850125 [PATCH] x86_64: Add X86_FEATURE_RDTSCP, fix rdtscp in /proc/cpuinfo
This patch adds the X86_FEATURE_RDTSCP #define, so that kernel code can
check for the feature easily and also fixes the location of the "rdtscp"
string in the cpuinfo tables.

Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:19 -07:00
Vojtech Pavlik
f8bf3c65a9 [PATCH] x86_64: Rename oem_force_hpet_timer to apic_is_clustered_box
Rename oem_force_hpet_timer to apic_is_clustered_box, to give the
function a better fitting name - it really isn't at all about HPET.

Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:19 -07:00
Rohit Seth
f3fa8ebc25 [PATCH] x86_64: moving phys_proc_id and cpu_core_id to cpuinfo_x86
Most of the fields of cpuinfo are defined in cpuinfo_x86 structure.
This patch moves the phys_proc_id and cpu_core_id for each processor to
cpuinfo_x86 structure as well.

Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:19 -07:00
Jon Mason
e465058d55 [PATCH] x86_64: Calgary IOMMU - Calgary specific bits
This patch hooks Calgary into the build, the x86-64 IOMMU
initialization paths, and introduces the Calgary specific bits.  The
implementation draws inspiration from both PPC (which has support for
the same chip but requires firmware support which we don't have on
x86-64) and gart. Calgary is different from gart in that it support a
translation table per PHB, as opposed to the single gart aperture.

Changes from previous version:
 * Addition of boot-time disablement for bus-level translation/isolation
   (e.g, enable userspace DMA for things like X)
 * Usage of newer IOMMU abstraction functions

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:19 -07:00
Jon Mason
0dc243ae10 [PATCH] x86_64: Calgary IOMMU - IOMMU abstractions
This patch creates a new interface for IOMMUs by adding a centralized
location for IOMMU allocation (for translation tables/apertures) and
IOMMU initialization.  In creating these, code was moved around for
abstraction, uniformity, and consiceness.

Take note of the move of the iommu_setup bootarg parsing code to
__setup.  This is enabled by moving back the location of the aperture
allocation/detection to mem init (which while ugly, was already the
location of the swiotlb_init).

While a slight departure from the previous patch, I belive this provides
the true intention of the previous versions of the patch which changed
this code.  It also makes the addition of the upcoming calgary code much
cleaner than previous patches.

[AK: Removed one broken change. iommu_setup still has to be called
early]

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:18 -07:00
Jon Mason
8d4f6b93a4 [PATCH] x86_64: Calgary IOMMU - introduce iommu_detected
swiotlb relies on the gart specific iommu_aperture variable to know if
we discovered a hardware IOMMU before swiotlb initialization.  Introduce
iommu_detected to do the same thing, but in a HW IOMMU neutral manner,
in preparation for adding the Calgary HW IOMMU.

Signed-Off-By: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-Off-By: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:18 -07:00
Rohit Seth
2bbc419f9d [PATCH] x86_64: Change assembly to use regular cpuid_count macro
Minor cleanup patch:

Replacing the asm statement with cpuid_count macro(which already
provides the same functionality).

Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:18 -07:00
Jan Beulich
46d13a384b [PATCH] x86_64: use halt() instead of raw inline assembly
Use abstractions whenever possible.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:18 -07:00
Jan Beulich
c33bd9aac0 [PATCH] i386/x86-64: fall back to old-style call trace if no unwinding
If no unwinding is possible at all for a certain exception instance,
fall back to the old style call trace instead of not showing any trace
at all.

Also, allow setting the stack trace mode at the command line.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:18 -07:00
Jan Beulich
dffead4e42 [PATCH] x86_64: reliable stack trace support (x86-64 syscall
Adjust the CFA offset for 64- and 32-bit syscall entries so that the five
slots pre-subtracted from the stack pointer do not appear to reside outside
of the current frame.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:17 -07:00