linux/kernel
Thomas Gleixner cdf71a10c7 futex: Prevent stale futex owner when interrupted/timeout
Roland Westrelin did a great analysis of a long standing thinko in the
return path of futex_lock_pi.

While we fixed the lock steal case long ago, which was easy to trigger,
we never had a test case which exposed this problem and stupidly never
thought about the reverse lock stealing scenario and the return to user
space with a stale state.

When a blocked tasks returns from rt_mutex_timed_locked without holding
the rt_mutex (due to a signal or timeout) and at the same time the task
holding the futex is releasing the futex and assigning the ownership of
the futex to the returning task, then it might happen that a third task
acquires the rt_mutex before the final rt_mutex_trylock() of the
returning task happens under the futex hash bucket lock. The returning
task returns to user space with ETIMEOUT or EINTR, but the user space
futex value is assigned to this task. The task which acquired the
rt_mutex fixes the user space futex value right after the hash bucket
lock has been released by the returning task, but for a short period of
time the user space value is wrong.

Detailed description is available at:

   https://bugzilla.redhat.com/show_bug.cgi?id=400541

The fix for this is the same as we do when the rt_mutex was acquired by
a higher priority task via lock stealing from the designated new owner.
In that case we already fix the user space value and the internal
pi_state up before we return. This mechanism can be used to fixup the
above corner case as well. When the returning task, which failed to
acquire the rt_mutex, notices that it is the designated owner of the
futex, then it fixes up the stale user space value and the pi_state,
before returning to user space. This happens with the futex hash bucket
lock held, so the task which acquired the rt_mutex is guaranteed to be
blocked on the hash bucket lock. We can access the rt_mutex owner, which
gives us the pid of the new owner, safely here as the owner is not able
to modify (release) it while waiting on the hash bucket lock.

Rename the "curr" argument of fixup_pi_state_owner() to "newowner" to
avoid confusion with current and add the check for the stale state into
the failure path of rt_mutex_trylock() in the return path of
unlock_futex_pi(). If the situation is detected use
fixup_pi_state_owner() to assign everything to the owner of the
rt_mutex.

Pointed-out-and-tested-by: Roland Westrelin <roland.westrelin@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-08 16:21:39 -08:00
..
irq genirq: revert lazy irq disable for simple irqs 2007-12-18 18:05:58 +01:00
power hibernate: fix lockdep report 2007-11-14 18:45:43 -08:00
time clockevents: fix reprogramming decision in oneshot broadcast 2007-12-18 18:05:58 +01:00
.gitignore
acct.c acct: real_parent ppid 2008-01-07 14:55:37 -08:00
audit.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
audit.h [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
audit_tree.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
auditfilter.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
auditsc.c auditsc: fix kernel-doc param warnings 2007-10-22 19:40:02 -07:00
capability.c Uninline find_pid etc set of functions 2007-10-19 11:53:41 -07:00
cgroup.c Improve cgroup printks 2007-11-14 18:45:37 -08:00
cgroup_debug.c Task Control Groups: simple task cgroup debug info subsystem 2007-10-19 11:53:36 -07:00
compat.c Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt 2007-10-18 15:12:41 -07:00
configs.c
cpu.c CPU HOTPLUG: avoid hotadd when proper possible_map isn't specified 2007-10-19 11:53:44 -07:00
cpuset.c hotplug cpu: migrate a task within its cpuset 2007-10-19 11:53:44 -07:00
delayacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
dma.c whitespace fixes: DMA channel allocator 2007-10-18 14:37:24 -07:00
exec_domain.c whitespace fixes: execution domains 2007-10-18 14:37:26 -07:00
exit.c wait_task_stopped(): pass correct exit_code to wait_noreap_copyout() 2007-11-29 09:24:55 -08:00
extable.c
fork.c fix clone(CLONE_NEWPID) 2007-12-05 09:21:18 -08:00
futex.c futex: Prevent stale futex owner when interrupted/timeout 2008-01-08 16:21:39 -08:00
futex_compat.c [FUTEX] Fix address computation in compat code. 2007-11-09 16:13:08 -08:00
hrtimer.c hrtimers: avoid overflow for large relative timeouts 2007-12-07 19:16:17 +01:00
itimer.c whitespace fixes: interval timers 2007-10-18 14:37:26 -07:00
kallsyms.c FRV: fix the extern declaration of kallsyms_num_syms 2007-11-29 09:24:54 -08:00
Kconfig.hz
Kconfig.instrumentation Tiny clean-up of OPROFILE/KPROBES configuration 2007-12-06 09:41:12 -08:00
Kconfig.preempt Move PREEMPT_NOTIFIERS into an always-included Kconfig 2007-10-17 08:42:55 -07:00
kexec.c vmcoreinfo: add the array length of "free_list" for filtering free pages 2008-01-08 16:10:36 -08:00
kfifo.c is_power_of_2: kernel/kfifo.c 2007-07-16 09:05:50 -07:00
kmod.c Restore call_usermodehelper_pipe() behaviour 2007-09-11 17:21:20 -07:00
kprobes.c kprobes: support kretprobe blacklist 2007-10-16 09:43:10 -07:00
ksysfs.c add-vmcore: cleanup the coding style according to Andrew's comments 2007-10-17 08:42:54 -07:00
kthread.c kthread: silence bogus section mismatch warning 2007-07-31 15:39:42 -07:00
latency.c
lockdep.c lockdep: make cli/sti annotation warnings clearer 2007-12-07 19:02:47 +01:00
lockdep_internals.h
lockdep_proc.c lockdep: Avoid /proc/lockdep & lock_stat infinite output 2007-10-11 22:11:11 +02:00
Makefile revert "Task Control Groups: example CPU accounting subsystem" 2007-11-14 18:45:40 -08:00
marker.c Linux Kernel Markers: fix marker mutex not taken upon module load 2007-11-14 18:45:40 -08:00
module.c module: fix and elaborate comments 2007-11-19 11:20:43 +11:00
mutex-debug.c
mutex-debug.h
mutex.c lockdep: fixup mutex annotations 2007-10-11 22:11:12 +02:00
mutex.h
notifier.c Add kernel/notifier.c 2007-10-19 11:53:34 -07:00
ns_cgroup.c cgroups: implement namespace tracking subsystem 2007-10-19 11:53:37 -07:00
nsproxy.c pid namespaces: allow cloning of new namespace 2007-10-19 11:53:39 -07:00
panic.c debug: add end-of-oops marker 2007-12-20 15:01:17 +01:00
params.c Modules: fix memory leak of module names 2007-12-22 23:09:05 -08:00
pid.c pidns: Place under CONFIG_EXPERIMENTAL 2007-11-14 18:45:43 -08:00
posix-cpu-timers.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
posix-timers.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
printk.c [SERIAL]: Fix section mismatches in Sun serial console drivers. 2007-12-29 01:19:49 -08:00
profile.c sched: document profile=sleep requiring CONFIG_SCHEDSTATS 2007-10-24 18:23:50 +02:00
ptrace.c Fix kernel/ptrace.c compile problem (missing "may_attach()") 2008-01-02 13:48:27 -08:00
rcupdate.c Clean up duplicate includes in kernel/ 2007-10-17 08:42:48 -07:00
rcutorture.c Make rcutorture RNG use temporal entropy 2007-10-17 08:42:53 -07:00
relay.c whitespace fixes: relayfs 2007-10-18 14:37:24 -07:00
resource.c Add IORESOUCE_BUSY flag for System RAM 2007-11-14 18:45:39 -08:00
rtmutex-debug.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
rtmutex-debug.h
rtmutex-tester.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
rtmutex.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
rtmutex.h
rtmutex_common.h
rwsem.c sched: mark rwsem functions as __sched for wchan/profiling 2007-12-18 15:21:13 +01:00
sched.c sched: touch softlockup watchdog after idling 2007-12-18 15:21:13 +01:00
sched_debug.c sched: fix gcc warnings 2007-12-30 17:24:35 +01:00
sched_fair.c sched: do not hurt SCHED_BATCH on wakeup 2007-12-18 15:21:13 +01:00
sched_idletask.c sched: isolate SMP balancing code a bit more 2007-10-24 18:23:51 +02:00
sched_rt.c sched: rt: account the cpu time during the tick 2007-12-20 15:01:17 +01:00
sched_stats.h sched: clean up kernel/sched_stat.h 2007-11-28 15:52:56 +01:00
seccomp.c
signal.c sigwait eats blocked default-ignore signals 2007-11-12 16:05:23 -08:00
softirq.c [KERNEL]: Unexport raise_softirq_irqoff 2007-10-10 16:49:18 -07:00
softlockup.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
spinlock.c lockstat: hook into spinlock_t, rwlock_t, rwsem and mutex 2007-07-19 10:04:49 -07:00
srcu.c
stacktrace.c
stop_machine.c
sys.c x86: ignore the sys_getcpu() tcache parameter 2007-11-17 16:27:00 +01:00
sys_ni.c [COMPAT]: Fix build on COMPAT platforms when CONFIG_NET is disabled. 2007-10-30 21:29:56 -07:00
sysctl.c sched: sysctl, proc_dointvec_minmax() expects int values for 2007-12-18 15:21:13 +01:00
sysctl_check.c sysctl: fix ax25 checks 2007-12-17 19:28:17 -08:00
taskstats.c kernel/taskstats.c: fix bogus nlmsg_free() 2007-11-14 18:45:44 -08:00
time.c whitespace fixes: time syscalls 2007-10-18 14:37:24 -07:00
timer.c timer: kernel/timer.c section fixes 2007-12-18 18:05:58 +01:00
tsacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
uid16.c
user.c sched: don't forget to unlock uids_mutex on error paths 2007-11-26 21:21:49 +01:00
user_namespace.c Fix user namespace exiting OOPs 2007-09-19 11:24:18 -07:00
utsname.c Fix UTS corruption during clone(CLONE_NEWUTS) 2007-09-19 11:24:17 -07:00
utsname_sysctl.c Isolate the UTS namespace's domainname and hostname back 2007-11-29 09:24:53 -08:00
wait.c
workqueue.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00