linux/arch/x86
Frederic Weisbecker a2bbe75089 x86: Don't use frame pointer to save old stack on irq entry
rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
in order to restore it later in ret_from_intr.

It is convenient because we save its value in the irq regs
and it's easily restored using the leave instruction.

However this is a kind of abuse of the frame pointer which
role is to help unwinding the kernel by chaining frames
together, each node following the return address to the
previous frame.

But although we are breaking the frame by changing the stack
pointer, there is no preceding return address before the new
frame. Hence using the frame pointer to link the two stacks
breaks the stack unwinders that find a random value instead of
a return address here.

There is no workaround that can work in every case. We are using
the fixup_bp_irq_link() function to dereference that abused frame
pointer in the case of non nesting interrupt (which means stack
changed).
But that doesn't fix the case of interrupts that don't change the
stack (but we still have the unconditional frame link), which is
the case of hardirq interrupting softirq. We have no way to detect
this transition so the frame irq link is considered as a real frame
pointer and the return address is dereferenced but it is still a
spurious one.

There are two possible results of this: either the spurious return
address, a random stack value, luckily belongs to the kernel text
and then the unwinding can continue and we just have a weird entry
in the stack trace. Or it doesn't belong to the kernel text and
unwinding stops there.

This is the reason why stacktraces (including perf callchains) on
irqs that interrupted softirqs don't work very well.

To solve this, we don't save the old stack pointer on rbp anymore
but we save it to a scratch register that we push on the new
stack and that we pop back later on irq return.

This preserves the whole frame chain without spurious return addresses
in the middle and drops the need for the horrid fixup_bp_irq_link()
workaround.

And finally irqs that interrupt softirq are sanely unwinded.

Before:

    99.81%         perf  [kernel.kallsyms]  [k] perf_pending_event
                   |
                   --- perf_pending_event
                       irq_work_run
                       smp_irq_work_interrupt
                       irq_work_interrupt
                      |
                      |--41.60%-- __read
                      |          |
                      |          |--99.90%-- create_worker
                      |          |          bench_sched_messaging
                      |          |          cmd_bench
                      |          |          run_builtin
                      |          |          main
                      |          |          __libc_start_main
                      |           --0.10%-- [...]

After:

     1.64%  swapper  [kernel.kallsyms]  [k] perf_pending_event
            |
            --- perf_pending_event
                irq_work_run
                smp_irq_work_interrupt
                irq_work_interrupt
               |
               |--95.00%-- arch_irq_work_raise
               |          irq_work_queue
               |          __perf_event_overflow
               |          perf_swevent_overflow
               |          perf_swevent_event
               |          perf_tp_event
               |          perf_trace_softirq
               |          __do_softirq
               |          call_softirq
               |          do_softirq
               |          irq_exit
               |          |
               |          |--73.68%-- smp_apic_timer_interrupt
               |          |          apic_timer_interrupt
               |          |          |
               |          |          |--96.43%-- amd_e400_idle
               |          |          |          cpu_idle
               |          |          |          start_secondary

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
2011-07-02 18:06:36 +02:00
..
boot x86, setup: When probing memory with e801, use ax/bx as a pair 2011-04-25 14:52:37 -07:00
configs cgroup: remove the ns_cgroup 2011-05-26 17:12:34 -07:00
crypto crypto: aesni-intel - fix aesni build on i386 2011-05-18 09:03:34 +10:00
ia32 ns: Wire up the setns system call 2011-05-28 10:48:39 -07:00
include/asm x86: Save stack pointer in perf live regs savings 2011-07-02 18:04:03 +02:00
kernel x86: Don't use frame pointer to save old stack on irq entry 2011-07-02 18:06:36 +02:00
kvm Merge branch 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm 2011-06-07 19:06:28 -07:00
lguest lguest: fix timer interrupt setup 2011-05-30 11:14:10 +09:30
lib Merge branches 'x86-apic-for-linus', 'x86-asm-for-linus' and 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-19 17:49:35 -07:00
math-emu
mm x86: Move do_page_fault()'s error path under unlikely() 2011-05-26 13:54:03 +02:00
net net: filter: Just In Time compiler for x86-64 2011-04-27 23:05:08 -07:00
oprofile Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent 2011-06-08 15:49:03 +02:00
pci Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 2011-05-23 15:39:34 -07:00
platform x86, efi: Retain boot service code until after switching to virtual mode 2011-05-25 17:03:53 -07:00
power
tools
vdso x86: vdso: Remove unused variable 2011-05-26 13:17:35 +02:00
video
xen xen: off by one errors in multicalls.c 2011-06-03 16:04:02 -04:00
.gitignore
Kbuild net: filter: Just In Time compiler for x86-64 2011-04-27 23:05:08 -07:00
Kconfig arch: remove CONFIG_GENERIC_FIND_{NEXT_BIT,BIT_LE,LAST_BIT} 2011-05-26 17:12:38 -07:00
Kconfig.cpu x86, cpu: Move AMD Elan Kconfig under "Processor family" 2011-04-08 13:01:25 -07:00
Kconfig.debug lib: consolidate DEBUG_STACK_USAGE option 2011-05-25 08:39:54 -07:00
Makefile
Makefile_32.cpu x86, cpu: Move AMD Elan Kconfig under "Processor family" 2011-04-08 13:01:25 -07:00