linux/kernel
David Rientjes 89e8a244b9 cpusets: avoid looping when storing to mems_allowed if one node remains set
{get,put}_mems_allowed() exist so that general kernel code may locklessly
access a task's set of allowable nodes without having the chance that a
concurrent write will cause the nodemask to be empty on configurations
where MAX_NUMNODES > BITS_PER_LONG.

This could incur a significant delay, however, especially in low memory
conditions because the page allocator is blocking and reclaim requires
get_mems_allowed() itself.  It is not atypical to see writes to
cpuset.mems take over 2 seconds to complete, for example.  In low memory
conditions, this is problematic because it's one of the most imporant
times to change cpuset.mems in the first place!

The only way a task's set of allowable nodes may change is through cpusets
by writing to cpuset.mems and when attaching a task to a generic code is
not reading the nodemask with get_mems_allowed() at the same time, and
then clearing all the old nodes.  This prevents the possibility that a
reader will see an empty nodemask at the same time the writer is storing a
new nodemask.

If at least one node remains unchanged, though, it's possible to simply
set all new nodes and then clear all the old nodes.  Changing a task's
nodemask is protected by cgroup_mutex so it's guaranteed that two threads
are not changing the same task's nodemask at the same time, so the
nodemask is guaranteed to be stored before another thread changes it and
determines whether a node remains set or not.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Paul Menage <paul@paulmenage.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:00 -07:00
..
debug kgdb: follow rename pack_hex_byte() to hex_byte_pack() 2011-10-31 17:30:56 -07:00
events mm: distinguish between mlocked and pinned pages 2011-10-31 17:30:46 -07:00
gcov gcov: disable CONSTRUCTORS for UML 2011-07-26 16:49:45 -07:00
irq Merge branch 'next/dt' of git://git.linaro.org/people/arnd/arm-soc 2011-11-01 21:02:35 -07:00
power Merge branch 'devel-stable' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm 2011-10-28 12:02:27 -07:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2011-10-26 17:15:03 +02:00
trace Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2011-10-26 17:03:38 +02:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks arch:Kconfig.locks Remove unused config option. 2011-04-10 17:01:05 +02:00
Kconfig.preempt sched: Isolate preempt counting in its own config option 2011-06-10 15:15:40 +02:00
Makefile Merge branch 'devel-stable' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm 2011-10-28 12:02:27 -07:00
acct.c
async.c async: uninitialized warning corrections 2011-09-15 14:22:28 +02:00
audit.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
audit.h
audit_tree.c audit_tree,rcu: Convert call_rcu(__put_tree) to kfree_rcu() 2011-07-20 14:10:11 -07:00
audit_watch.c kill path_lookup() 2011-03-14 09:15:23 -04:00
auditfilter.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
auditsc.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
backtracetest.c
bounds.c memcg: remove direct page_cgroup-to-page pointer 2011-03-23 19:46:28 -07:00
capability.c Merge branch 'master' into next 2011-05-19 18:51:57 +10:00
cgroup.c cgroups: don't attach task to subsystem if migration failed 2011-11-02 16:06:59 -07:00
cgroup_freezer.c cgroups: add per-thread subsystem callbacks 2011-05-26 17:12:34 -07:00
compat.c Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 2011-07-30 00:08:53 -07:00
configs.c kernel/configs.c: include MODULE_*() when CONFIG_IKCONFIG_PROC=n 2011-07-25 20:57:15 -07:00
cpu.c Fix common misspellings 2011-03-31 11:26:23 -03:00
cpu_pm.c cpu_pm: call notifiers during suspend 2011-09-23 12:05:29 +05:30
cpuset.c cpusets: avoid looping when storing to mems_allowed if one node remains set 2011-11-02 16:07:00 -07:00
crash_dump.c [S390] kdump: Add size to elfcorehdr kernel parameter 2011-10-30 15:16:41 +01:00
cred.c Merge branch 'next' of git://selinuxproject.org/~jmorris/linux-security 2011-10-25 09:45:31 +02:00
delayacct.c KVM: Steal time implementation 2011-07-14 12:59:14 +03:00
dma.c
elfcore.c
exec_domain.c
exit.c oom: remove oom_disable_count 2011-10-31 17:30:45 -07:00
extable.c extable, core_kernel_data(): Make sure all archs define _sdata 2011-05-20 08:56:56 +02:00
fork.c oom: remove oom_disable_count 2011-10-31 17:30:45 -07:00
freezer.c PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks too 2011-10-16 23:30:37 +02:00
futex.c Merge branch 'master' into for-next 2011-09-15 15:08:18 +02:00
futex_compat.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
groups.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
hrtimer.c hrtimers: Fix typo causing erratic timers 2011-05-25 15:31:58 -07:00
hung_task.c watchdog, hung_task_timeout: Add Kconfig configurable default 2011-04-28 09:13:17 +02:00
irq_work.c llist: Add llist_next() 2011-10-04 12:43:53 +02:00
itimer.c
jump_label.c jump_label: Fix jump_label update for modules 2011-06-29 09:59:17 -04:00
kallsyms.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-25 17:52:22 -07:00
kexec.c [S390] kdump: Add infrastructure for unmapping crashkernel memory 2011-10-30 15:16:42 +01:00
kfifo.c
kmod.c kmod: prevent kmod_loop_msg overflow in __request_module() 2011-10-26 13:10:39 +10:30
kprobes.c locking, kprobes: Annotate the hash locks and kretprobe.lock as raw 2011-09-13 11:11:45 +02:00
ksysfs.c kernel/ksysfs.c: expose file_caps_enabled in sysfs 2011-04-19 16:45:51 -07:00
kthread.c cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed 2011-05-28 17:02:57 +02:00
latencytop.c locking, latencytop: Annotate latency_lock as raw 2011-09-13 11:12:02 +02:00
lockdep.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2011-10-26 16:26:53 +02:00
lockdep_internals.h
lockdep_proc.c lockdep: Remove unused 'factor' variable from lockdep_stats_show() 2011-03-23 13:54:47 +01:00
lockdep_states.h
module.c Tracepoint: Dissociate from module mutex 2011-08-10 20:38:14 -04:00
mutex-debug.c mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex-debug.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex.c lockdep, mutex: provide mutex_lock_nest_lock 2011-05-25 08:39:17 -07:00
mutex.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
notifier.c notifiers: sys: move reboot notifiers into reboot.h 2011-07-25 20:57:14 -07:00
nsproxy.c make sure that nsproxy_cache is initialized early enough 2011-07-20 01:44:07 -04:00
padata.c Fix common misspellings 2011-03-31 11:26:23 -03:00
panic.c panic: panic=-1 for immediate reboot 2011-07-26 16:49:45 -07:00
params.c params: make dashes and underscores in parameter names truly equal 2011-10-26 13:10:39 +10:30
pid.c rcu: Restore checks for blocking in RCU read-side critical sections 2011-09-28 21:36:37 -07:00
pid_namespace.c pidns: call pid_ns_prepare_proc() from create_pid_namespace() 2011-03-23 19:46:58 -07:00
posix-cpu-timers.c Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2011-10-26 16:17:32 +02:00
posix-timers.c posix-timers: RCU conversion 2011-05-24 12:10:51 +02:00
printk.c printk: remove bounds checking for log_prefix 2011-10-31 17:30:53 -07:00
profile.c kernel/profile.c: remove some duplicate code from profile_hits() 2011-05-26 17:12:37 -07:00
ptrace.c ptrace: PTRACE_LISTEN forgets to unlock ->siglock 2011-09-25 11:02:00 -07:00
range.c
rcu.h rcu: Add grace-period, quiescent-state, and call_rcu trace events 2011-09-28 21:38:21 -07:00
rcupdate.c rcu: Add event-tracing for RCU callback invocation 2011-09-28 21:38:12 -07:00
rcutiny.c rcu: Add grace-period, quiescent-state, and call_rcu trace events 2011-09-28 21:38:21 -07:00
rcutiny_plugin.h rcu: Make TINY_RCU also use softirq for RCU_BOOST=n 2011-09-28 21:38:20 -07:00
rcutorture.c rcu: Make rcu_torture_boost() exit loops at end of test 2011-09-28 21:38:46 -07:00
rcutree.c rcu: Move propagation of ->completed from rcu_start_gp() to rcu_report_qs_rsp() 2011-09-28 21:38:49 -07:00
rcutree.h rcu: Remove rcu_needs_cpu_flush() to avoid false quiescent states 2011-09-28 21:38:48 -07:00
rcutree_plugin.h rcu: Remove rcu_needs_cpu_flush() to avoid false quiescent states 2011-09-28 21:38:48 -07:00
rcutree_trace.c rcu: Simplify quiescent-state accounting 2011-09-28 21:38:22 -07:00
relay.c
res_counter.c memcg: res_counter_read_u64(): fix potential races on 32-bit machines 2011-03-23 19:46:22 -07:00
resource.c Resource: fix wrong resource window calculation 2011-09-29 20:04:34 -07:00
rtmutex-debug.c rtmutex: Add missing rcu_read_unlock() in debug_rt_mutex_print_deadlock() 2011-10-05 13:20:24 +02:00
rtmutex-debug.h
rtmutex-tester.c rtmutex: tester: Remove the remaining BKL leftovers 2011-02-22 22:07:22 +01:00
rtmutex.c rcu: Permit rt_mutex_unlock() with irqs disabled 2011-09-28 21:38:43 -07:00
rtmutex.h
rtmutex_common.h
rwsem.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
sched.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2011-10-26 17:08:43 +02:00
sched_autogroup.c Fix common misspellings 2011-03-31 11:26:23 -03:00
sched_autogroup.h sched: Skip autogroup when looking for all rt sched groups 2011-07-01 10:39:08 +02:00
sched_clock.c
sched_cpupri.c sched/cpupri: Remove cpupri->pri_active 2011-08-14 12:01:11 +02:00
sched_cpupri.h sched/cpupri: Remove cpupri->pri_active 2011-08-14 12:01:11 +02:00
sched_debug.c sched: Get rid of lock_depth 2011-04-24 13:18:38 +02:00
sched_fair.c sched: Wrap scheduler p->cpus_allowed access 2011-10-06 12:46:56 +02:00
sched_features.h sched: Kill WAKEUP_PREEMPT 2011-08-14 12:00:41 +02:00
sched_idletask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
sched_rt.c sched: Warn on rt throttling 2011-10-06 12:47:04 +02:00
sched_stats.h locking, sched: Annotate thread_group_cputimer as raw 2011-09-13 11:11:55 +02:00
sched_stoptask.c sched: Implement hierarchical task accounting for SCHED_OTHER 2011-08-14 12:01:13 +02:00
seccomp.c
semaphore.c locking, semaphores: Annotate inner lock as raw 2011-09-13 11:11:57 +02:00
signal.c user namespace: usb: make usb urbs user namespace aware (v2) 2011-09-29 13:13:08 -07:00
smp.c generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts 2011-06-17 10:17:12 +02:00
softirq.c softirq,rcu: Inform RCU of irq_exit() activity 2011-07-20 10:50:12 -07:00
spinlock.c
srcu.c
stacktrace.c stack_trace: Add weak save_stack_trace_regs() 2011-06-14 22:48:52 -04:00
stop_machine.c stop_machine: make stop_machine safe and efficient to call early 2011-10-31 17:30:53 -07:00
sys.c Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2011-10-24 18:18:09 -04:00
sys_ni.c Cross Memory Attach 2011-10-31 17:30:44 -07:00
sysctl.c Merge branch 'akpm' (Andrew's incoming) 2011-10-31 17:46:07 -07:00
sysctl_binary.c ipv4: NET_IPV4_ROUTE_GC_INTERVAL removal 2011-10-03 14:13:01 -04:00
sysctl_check.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
taskstats.c Make TASKSTATS require root access 2011-09-19 17:04:37 -07:00
test_kprobes.c
time.c time: Change jiffies_to_clock_t() argument type to unsigned long 2011-09-21 10:28:51 +02:00
timeconst.pl
timer.c timers: Consider slack value in mod_timer() 2011-06-03 15:02:32 +02:00
tracepoint.c Tracepoint: Dissociate from module mutex 2011-08-10 20:38:14 -04:00
tsacct.c Make taskstats round statistics down to nearest 1k bytes/events 2011-09-19 17:10:57 -07:00
uid16.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
up.c
user-return-notifier.c Fix common misspellings 2011-03-31 11:26:23 -03:00
user.c userns: add a user_namespace as creator/owner of uts_namespace 2011-03-23 19:46:59 -07:00
user_namespace.c
utsname.c ns proc: Add support for the uts namespace 2011-05-10 14:35:35 -07:00
utsname_sysctl.c sysctl kernel: Remove binary sysctl logic 2009-11-12 02:04:55 -08:00
wait.c Fix common misspellings 2011-03-31 11:26:23 -03:00
watchdog.c watchdog: move watchdog_*_all_cpus under CONFIG_SYSCTL 2011-10-31 17:30:53 -07:00
workqueue.c workqueue: lock cwq access in drain_workqueue 2011-09-14 18:09:38 -07:00
workqueue_sched.h