linux/kernel/sched
Frederic Weisbecker 20ab65e33f rcu: Exit RCU extended QS on user preemption
When exceptions or irq are about to resume userspace, if
the task needs to be rescheduled, the arch low level code
calls schedule() directly.

If we call it, it is because we have the TIF_RESCHED flag:

- It can be set after random local calls to set_need_resched()
(RCU, drm, ...)

- A wake up happened and the CPU needs preemption. This can
  happen in several ways:

    * Remotely: the remote waking CPU has set TIF_RESCHED and send the
      wakee an IPI to schedule the new task.
    * Remotely enqueued: the remote waking CPU sends an IPI to the target
      and the wake up is made by the target.
    * Locally: waking CPU == wakee CPU and the wakeup is done locally.
      set_need_resched() is called without IPI.

In the case of local and remotely enqueued wake ups, the tick can
be restarted when we enqueue the new task and RCU can exit the
extended quiescent state at the same time. Then by the time we reach
irq exit path and we call schedule, we are not in RCU user mode.

But if we call schedule() only because something called set_need_resched(),
RCU may still be in user mode when we reach schedule.

Also if a wake up is done remotely, the CPU might see the TIF_RESCHED
flag and call schedule while the IPI has not yet happen to restart the
tick and exit RCU user mode.

We need to manually protect against these corner cases.

Create a new API schedule_user() that calls schedule() inside
rcu_user_exit()-rcu_user_enter() in order to protect it. Archs
will need to rely on it now to implement user preemption safely.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26 15:47:11 +02:00
..
Makefile init_task: Create generic init_task instance 2012-05-05 13:00:21 +02:00
auto_group.c sched: Clean up parameter passing of proc_sched_autogroup_set_nice() 2012-03-02 12:23:49 +01:00
auto_group.h
clock.c
core.c rcu: Exit RCU extended QS on user preemption 2012-09-26 15:47:11 +02:00
cpupri.c sched: Fix minor code style issues 2012-07-26 11:47:00 +02:00
cpupri.h
debug.c sched/debug: Fix printing large integers on 32-bit platforms 2012-05-14 15:05:28 +02:00
fair.c Revert "sched: Improve scalability via 'CPU buddies', which withstand random perturbations" 2012-09-16 12:29:43 -07:00
features.h sched: Fix more load-balancing fallout 2012-04-26 12:54:52 +02:00
idle_task.c sched/nohz: Rewrite and fix load-avg computation -- again 2012-07-05 20:58:13 +02:00
rt.c sched: Unthrottle rt runqueues in __disable_runtime() 2012-09-04 14:30:30 +02:00
sched.h sched: Unthrottle rt runqueues in __disable_runtime() 2012-09-04 14:30:30 +02:00
stats.c sched: Remove sched_switch 2012-01-27 13:28:53 +01:00
stats.h
stop_task.c sched: Fix migration thread runtime bogosity 2012-08-13 18:41:55 +02:00