linux/kernel
Paul Jackson 3077a260e9 [PATCH] cpuset release ABBA deadlock fix
Fix possible cpuset_sem ABBA deadlock if 'notify_on_release' set.

For a particular usage pattern, creating and destroying cpusets fairly
frequently using notify_on_release, on a very large system, this deadlock
can be seen every few days.  If you are not using the cpuset
notify_on_release feature, you will never see this deadlock.

The existing code, on task exit (or cpuset deletion) did:

  get cpuset_sem
  if cpuset marked notify_on_release and is ready to release:
    compute cpuset path relative to /dev/cpuset mount point
    call_usermodehelper() forks /sbin/cpuset_release_agent with path
  drop cpuset_sem

Unfortunately, the fork in call_usermodehelper can allocate memory, and
allocating memory can require cpuset_sem, if the mems_generation values
changed in the interim.  This results in an ABBA deadlock, trying to obtain
cpuset_sem when it is already held by the current task.

To fix this, I put the cpuset path (which must be computed while holding
cpuset_sem) in a temporary buffer, to be used in the call_usermodehelper
call of /sbin/cpuset_release_agent only _after_ dropping cpuset_sem.

So the new logic is:

  get cpuset_sem
  if cpuset marked notify_on_release and is ready to release:
    compute cpuset path relative to /dev/cpuset mount point
    stash path in kmalloc'd buffer
  drop cpuset_sem
  call_usermodehelper() forks /sbin/cpuset_release_agent with path
  free path

The sharp eyed reader might notice that this patch does not contain any
calls to kmalloc.  The existing code in the check_for_release() routine was
already kmalloc'ing a buffer to hold the cpuset path.  In the old code, it
just held the buffer for a few lines, over the cpuset_release_agent() call
that in turn invoked call_usermodehelper().  In the new code, with the
application of this patch, it returns that buffer via the new char
**ppathbuf parameter, for later use and freeing in cpuset_release_agent(),
which is called after cpuset_sem is dropped.  Whereas the old code has just
one call to cpuset_release_agent(), right in the check_for_release()
routine, the new code has three calls to cpuset_release_agent(), from the
various places that a cpuset can be released.

This patch has been build and booted on SN2, and passed a stress test that
previously hit the deadlock within a few seconds.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-09 12:08:22 -07:00
..
irq [PATCH] irqpoll 2005-06-28 21:20:35 -07:00
power [PATCH] Address BUG: using smp_processor_id() in preemptible [00000001] code 2005-07-27 16:25:50 -07:00
Kconfig.hz [PATCH] i386: Selectable Frequency of the Timer Interrupt 2005-06-23 09:45:10 -07:00
Kconfig.preempt [PATCH] sched: voluntary kernel preemption 2005-06-25 16:24:45 -07:00
Makefile [PATCH] kdump: Routines for copying dump pages 2005-06-25 16:24:53 -07:00
acct.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
audit.c AUDIT: Unify auid reporting, put arch before syscall number 2005-05-23 21:35:28 +01:00
auditsc.c AUDIT: Record working directory when syscall arguments are pathnames 2005-05-27 12:17:28 +01:00
capability.c [PATCH] kernel/capability.c: add kerneldoc 2005-07-27 16:26:06 -07:00
compat.c [PATCH] Fix get_compat_sigevent() 2005-04-16 15:24:01 -07:00
configs.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
cpu.c [PATCH] i386 CPU hotplug 2005-06-25 16:24:29 -07:00
cpuset.c [PATCH] cpuset release ABBA deadlock fix 2005-08-09 12:08:22 -07:00
crash_dump.c [PATCH] kernel/crash_dump.c: add kerneldoc 2005-07-27 16:26:06 -07:00
dma.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
exec_domain.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
exit.c [PATCH] revert "timer exit cleanup" 2005-08-04 16:57:49 -07:00
extable.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
fork.c [PATCH] lower VM_DONTCOPY total_vm 2005-07-12 16:00:58 -07:00
futex.c [PATCH] convert that currently tests _NSIG directly to use valid_signal() 2005-05-01 08:59:14 -07:00
intermodule.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
itimer.c [PATCH] itimer fixes 2005-07-27 16:25:51 -07:00
kallsyms.c [PATCH] ppc32: platform-specific functions missing from kallsyms. 2005-05-05 16:36:31 -07:00
kexec.c [PATCH] kexec: fix sparse warnings 2005-06-28 14:53:40 -07:00
kfifo.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
kmod.c [PATCH] Keys: Pass session keyring to call_usermodehelper() 2005-06-24 00:05:18 -07:00
kprobes.c [PATCH] kprobes: fix namespace problem and sparc64 build 2005-07-05 19:19:00 -07:00
ksysfs.c [PATCH] Kdump: Export crash notes section address through sysfs 2005-06-25 16:24:51 -07:00
kthread.c [PATCH] use smp_mb/wmb/rmb where possible 2005-05-01 08:58:47 -07:00
module.c [PATCH] Module per-cpu alignment cannot always be met 2005-08-01 21:38:01 -07:00
panic.c [PATCH] Call emergency_reboot from panic 2005-07-26 14:35:43 -07:00
params.c [PATCH] sysfs: (rest) if show/store is missing return -EIO 2005-06-20 15:15:03 -07:00
pid.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
posix-cpu-timers.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
posix-timers.c [PATCH] revert "timer exit cleanup" 2005-08-04 16:57:49 -07:00
printk.c [PATCH] CPU hotplug printk fix 2005-06-25 16:24:34 -07:00
profile.c [PATCH] mostly_read data section 2005-07-07 18:23:46 -07:00
ptrace.c [PATCH] convert that currently tests _NSIG directly to use valid_signal() 2005-05-01 08:59:14 -07:00
rcupdate.c [PATCH] Deprecate synchronize_kernel, GPL replacement 2005-05-01 08:59:04 -07:00
resource.c [PATCH] Use ALIGN to remove duplicate code 2005-06-25 16:25:02 -07:00
sched.c [PATCH] fix MAX_USER_RT_PRIO and MAX_RT_PRIO 2005-07-26 15:40:00 -07:00
seccomp.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
signal.c [PATCH] Cleanup patch for process freezing 2005-06-25 17:10:13 -07:00
softirq.c [PATCH] revert bogus softirq changes 2005-07-30 10:49:59 -07:00
spinlock.c [PATCH] spin_unlock_bh() and preempt_check_resched() 2005-05-21 10:46:48 -07:00
stop_machine.c [PATCH] smp_processor_id() cleanup 2005-06-21 18:46:13 -07:00
sys.c [PATCH] Remove suspend() calls from shutdown path 2005-08-04 08:20:47 -07:00
sys_ni.c [PATCH] remove sys_set_zone_reclaim() 2005-08-01 10:03:56 -07:00
sysctl.c [PATCH] s390: spin lock retry 2005-07-27 16:26:04 -07:00
time.c [PATCH] clean up inline static vs static inline 2005-07-27 16:26:20 -07:00
timer.c [PATCH] kernel/timer: fix msleep_interruptible() comment 2005-06-25 16:24:58 -07:00
uid16.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
user.c [PATCH] inotify 2005-07-12 20:38:38 -07:00
wait.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
workqueue.c [PATCH] re-export cancel_rearming_delayed_workqueue 2005-04-16 15:23:59 -07:00