linux/arch/ia64/kernel
Alex Chiang 66db2e6331 [IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs"
This reverts commit e7b140365b.

Commit e7b14036 removes the targetted disabled CPU from the
cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.

Paul McKenney states that the reasoning behind the patch was to
prevent irq handlers from running on CPUs marked offline because:

	RCU happily ignores CPUs that don't have their bits set in
	cpu_online_map, so if there are RCU read-side critical sections
	in the irq handlers being run, RCU will ignore them.  If the
	other CPUs were running, they might sequence through the RCU
	state machine, which could result in data structures being
	yanked out from under those irq handlers, which in turn could
	result in oopses or worse.

Unfortunately, both ia64 functions above look at cpu_online_map to find
a new CPU to migrate interrupts onto. This means we can potentially
migrate an interrupt off ourself back to... ourself. Uh oh.

This causes an oops when we finally try to process pending interrupts on
the CPU we want to disable. The oops results from calling __do_IRQ with
a NULL pt_regs:

Unable to handle kernel NULL pointer dereference (address 0000000000000040)
Call Trace:
 [<a000000100016930>] show_stack+0x50/0xa0
                                sp=e0000009c922fa00 bsp=e0000009c92214d0
 [<a0000001000171a0>] show_regs+0x820/0x860
                                sp=e0000009c922fbd0 bsp=e0000009c9221478
 [<a00000010003c700>] die+0x1a0/0x2e0
                                sp=e0000009c922fbd0 bsp=e0000009c9221438
 [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
                                sp=e0000009c922fbd0 bsp=e0000009c92213d8
 [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000009c922fc60 bsp=e0000009c92213d8
 [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
                                sp=e0000009c922fe30 bsp=e0000009c9221398
 [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
                                sp=e0000009c922fe30 bsp=e0000009c9221330
 [<a00000010013a800>] handle_IRQ_event+0x80/0x120
                                sp=e0000009c922fe30 bsp=e0000009c92212f8
 [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
                                sp=e0000009c922fe30 bsp=e0000009c9221290
 [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
                                sp=e0000009c922fe30 bsp=e0000009c9221208
 [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
                                sp=e0000009c922fe30 bsp=e0000009c92211a8
 [<a00000010005bd80>] __cpu_disable+0x140/0x240
                                sp=e0000009c922fe30 bsp=e0000009c9221168
 [<a0000001006c5870>] take_cpu_down+0x50/0xa0
                                sp=e0000009c922fe30 bsp=e0000009c9221148
 [<a000000100122610>] stop_cpu+0xd0/0x200
                                sp=e0000009c922fe30 bsp=e0000009c92210f0
 [<a0000001000e0440>] kthread+0xc0/0x140
                                sp=e0000009c922fe30 bsp=e0000009c92210c8
 [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
                                sp=e0000009c922fe30 bsp=e0000009c92210a0
 [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                sp=e0000009c922fe30 bsp=e0000009c92210a0

I don't like this revert because it is fragile. ia64 is getting lucky
because we seem to only ever process timer interrupts in this path, but
if we ever race with an IPI here, we definitely use RCU and have the
potential of hitting an oops that Paul describes above.

Patching ia64's timer_interrupt() to check for NULL pt_regs is
insufficient though, as we still hit the above oops.

As a short term solution, I do think that this revert is the right
answer. The revert hold up under repeated testing (24+ hour test runs)
with this setup:

	- 8-way rx6600
	- randomly toggling CPU online/offline state every 2 seconds
	- running CPU exercisers, memory hog, disk exercisers, and
	  network stressors
	- average system load around ~160

In the long term, we really need to figure out why we set pt_regs = NULL
in ia64_process_pending_intr(). If it turns out that it is unnecessary
to do so, then we could safely re-introduce e7b14036 (along with some
other logic to be smarter about migrating interrupts).

One final note: x86 also removes the disabled CPU from cpu_online_map
and then re-enables interrupts for 1ms, presumably to handle any pending
interrupts:

arch/x86/kernel/irq_32.c (and irq_64.c):
cpu_disable_common:
	[remove cpu from cpu_online_map]

	fixup_irqs():
		for_each_irq:
			[break CPU affinities]

		local_irq_enable();
		mdelay(1);
		local_irq_disable();

So they are doing implicitly what ia64 is doing explicitly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
2009-02-19 11:32:26 -08:00
..
cpufreq [IA64] improper printk format in acpi-cpufreq 2008-07-17 11:11:17 -07:00
.gitignore [IA64] Cleanup generated file not ignored by .gitignore 2008-08-04 11:06:16 -07:00
Makefile Pull vtd-iommu into release branch 2008-10-17 13:52:22 -07:00
acpi-ext.c
acpi-processor.c ACPI: Set _PSD ACPI_PDC_SMP_T_SWCOORD 2008-02-02 02:22:43 -05:00
acpi.c Merge branch 'linus' into release 2009-01-09 03:39:43 -05:00
asm-offsets.c ia64/pv_ops/xen: elf note based xen startup. 2008-10-17 10:02:21 -07:00
audit.c [PATCH] audit signal recipients 2007-05-11 05:38:25 -04:00
brl_emu.c
crash.c [IA64] simplify notify hooks in mca.c 2008-04-22 08:56:38 -07:00
crash_dump.c kdump: make elfcorehdr_addr independent of CONFIG_PROC_VMCORE 2008-10-20 08:52:39 -07:00
cyclone.c cyclone.c: silly use of volatile, __iomem fixes 2007-07-26 11:11:57 -07:00
efi.c always reserve elfcore header memory in crash kernel 2008-10-20 08:52:40 -07:00
efi_stub.S
entry.S [CVE-2009-0029] Remove __attribute__((weak)) from sys_pipe/sys_pipe2 2009-01-14 14:15:15 +01:00
entry.h
err_inject.c sysdev: Pass the attribute to the low level sysdev show/store function 2008-07-21 21:55:02 -07:00
esi.c
esi_stub.S
fsys.S Pull miscellaneous into release branch 2008-04-17 10:14:51 -07:00
fsyscall_gtod_data.h [IA64] generalize attribute of fsyscall_gtod_data 2008-02-04 15:36:36 -08:00
gate-data.S
gate.S [IA64] Stop bit for brl instruction 2007-07-09 13:37:44 -07:00
gate.lds.S [IA64] increase .data.patch offset 2007-12-07 14:28:02 -08:00
head.S [IA64] Rationalize kernel mode alignment checking 2008-11-20 13:27:12 -08:00
ia64_ksyms.c Generic semaphore implementation 2008-04-17 10:42:34 -04:00
init_task.c take init_fs to saner place 2008-12-31 18:07:42 -05:00
iosapic.c cpumask: IA64: Introduce cpumask_of_{node,pcibus} to replace {node,pcibus}_to_cpumask 2008-12-26 22:23:40 +10:30
irq.c ia64: cpumask fix for is_affinity_mask_valid() 2009-01-04 15:39:24 +01:00
irq_ia64.c [IA64] pvops: add hooks, pv_irq_ops, to paravirtualized irq related operations. 2008-05-27 15:11:10 -07:00
irq_lsapic.c [IA64] spelling fixes: arch/ia64/ 2007-05-11 14:55:43 -07:00
ivt.S ia64/pv_ops: fix paravirtualization of ivt.S with CONFIG_SMP=n 2008-10-17 09:50:09 -07:00
jprobes.S [IA64] Move include/asm-ia64 to arch/ia64/include/asm 2008-08-01 10:21:21 -07:00
kprobes.c kprobes: check CONFIG_FREEZER instead of CONFIG_PM 2009-01-16 14:32:17 -05:00
machine_kexec.c vmcoreinfo: fix the configuration dependencies 2008-02-07 08:42:25 -08:00
machvec.c [IA64] Ensure that machvec is set up takes place before serial console 2007-07-25 11:12:47 -07:00
mca.c [IA64] use mprintk instead of printk, in ia64_mca_modify_original_stack 2008-11-20 13:31:10 -08:00
mca_asm.S [IA64] Add API for allocating Dynamic TR resource. 2008-04-03 11:02:58 -07:00
mca_drv.c CRED: Wrap task credential accesses in the IA64 arch 2008-11-14 10:38:37 +11:00
mca_drv.h [IA64] mca style cleanup 2008-02-04 15:42:06 -08:00
mca_drv_asm.S [IA64] mca style cleanup 2008-02-04 15:42:06 -08:00
minstate.h [IA64] pvops: paravirtualize minstate.h. 2008-05-27 15:02:17 -07:00
module.c [IA64] fix compile failure with non modular builds 2008-09-10 10:46:32 -07:00
msi_ia64.c cpumask: make irq_set_affinity() take a const struct cpumask 2008-12-13 21:20:26 +10:30
nr-irqs.c ia64/pv_ops/xen: define the nubmer of irqs which xen needs. 2008-10-17 10:06:59 -07:00
numa.c [IA64] Minimize per_cpu reservations. 2008-04-08 13:51:35 -07:00
pal.S
palinfo.c smp_call_function: get rid of the unused nonatomic/retry argument 2008-06-26 11:24:35 +02:00
paravirt.c [IA64] ia64/pv_ops/pv_cpu_ops: fix _IA64_REG_IP case. 2008-11-20 13:41:20 -08:00
paravirt_inst.h ia64/pv_ops: paravirtualized instruction checker. 2008-10-17 10:12:54 -07:00
paravirtentry.S [IA64] pvops: paravirtualize entry.S 2008-05-27 15:08:01 -07:00
patch.c [IA64] Workaround for RSE issue 2008-05-27 13:24:39 -07:00
pci-dma.c IA64: struct device - replace bus_id with dev_name(), dev_set_name() 2009-01-06 10:44:40 -08:00
pci-swiotlb.c [IA64] Add Variable Page Size and IA64 Support in Intel IOMMU 2008-10-17 12:14:13 -07:00
perfmon.c CRED: Use RCU to access another task's creds and to release a task's own creds 2008-11-14 10:39:19 +11:00
perfmon_default_smpl.c [IA64] remove remaining __FUNCTION__ occurrences 2008-03-06 09:19:27 -08:00
perfmon_generic.h
perfmon_itanium.h
perfmon_mckinley.h [IA64] spelling fixes: arch/ia64/ 2007-05-11 14:55:43 -07:00
perfmon_montecito.h
process.c Pull pv_ops-xen into release branch 2008-10-17 13:51:28 -07:00
ptrace.c [IA64] utrace use generic trace hook 2008-10-06 10:43:06 -07:00
relocate_kernel.S [IA64] Removal of percpu TR cleanup in kexec code 2007-05-08 10:00:28 -07:00
sal.c [IA64] Update check_sal_cache_flush to use platform_send_ipi() 2008-06-11 16:40:33 -07:00
salinfo.c ia64: use non-racy method for proc entries creation 2008-04-29 08:06:21 -07:00
setup.c [IA64] Reserve elfcorehdr memory in CONFIG_CRASH_DUMP 2008-11-07 09:51:55 -08:00
sigframe.h [IA64] Add TIF_RESTORE_SIGMASK 2007-05-08 14:51:59 -07:00
signal.c CRED: Wrap task credential accesses in the IA64 arch 2008-11-14 10:38:37 +11:00
smp.c [IA64] Shrink shadow_flush_counts to a short array to save 8k of per_cpu area. 2008-08-18 15:39:48 -07:00
smpboot.c [IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs" 2009-02-19 11:32:26 -08:00
sys_ia64.c [CVE-2009-0029] Remove __attribute__((weak)) from sys_pipe/sys_pipe2 2009-01-14 14:15:15 +01:00
time.c [PATCH] idle cputime accounting 2008-12-31 15:11:46 +01:00
topology.c cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers. 2008-12-13 21:20:25 +10:30
traps.c [IA64] honor notify_die() returning NOTIFY_STOP 2008-02-05 08:26:44 -08:00
unaligned.c [IA64] dump stack on kernel unaligned warnings 2009-01-15 10:38:56 -08:00
uncached.c smp_call_function: get rid of the unused nonatomic/retry argument 2008-06-26 11:24:35 +02:00
unwind.c [IA64] remove remaining __FUNCTION__ occurrences 2008-03-06 09:19:27 -08:00
unwind_decoder.c
unwind_i.h
vmlinux.lds.S [IA64] Put the space for cpu0 per-cpu area into .data section 2008-09-29 16:39:19 -07:00