linux

Commit Graph

Author	SHA1	Message	Date
Eric W. Biederman	49d769d52e	Change reparent_to_init to reparent_to_kthreadd When a kernel thread calls daemonize, instead of reparenting the thread to init reparent the thread to kthreadd next to the threads created by kthread_create. This is really just a stop gap until daemonize goes away, but it does ensure no kernel threads are under init and they are all in one place that is easy to find. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:53 -07:00
Eric W. Biederman	73c279927f	kthread: don't depend on work queues Currently there is a circular reference between work queue initialization and kthread initialization. This prevents the kthread infrastructure from initializing until after work queues have been initialized. We want the properties of tasks created with kthread_create to be as close as possible to the init_task and to not be contaminated by user processes. The later we start our kthreadd that creates these tasks the harder it is to avoid contamination from user processes and the more of a mess we have to clean up because the defaults have changed on us. So this patch modifies the kthread support to not use work queues but to instead use a simple list of structures, and to have kthreadd start from init_task immediately after our kernel thread that execs /sbin/init. By being a true child of init_task we only have to change those process settings that we want to have different from init_task, such as our process name, the cpus that are allowed, blocking all signals and setting SIGCHLD to SIG_IGN so that all of our children are reaped automatically. By being a true child of init_task we also naturally get our ppid set to 0 and do not wind up as a child of PID == 1. Ensuring that tasks generated by kthread_create will not slow down the functioning of the wait family of functions. [akpm@linux-foundation.org: use interruptible sleeps] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:53 -07:00
Oleg Nesterov	c93465181f	____call_usermodehelper: don't flush_signals() ____call_usermodehelper() has no reason for flush_signals(). It is a fresh forked process which is going to exec a user-space application or exit on failure. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:53 -07:00
Oleg Nesterov	28e53bddf8	unify flush_work/flush_work_keventd and rename it to cancel_work_sync flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq (this was possible from the very beginnig, I missed this). So we can unify flush_work_keventd and flush_work. Also, rename flush_work() to cancel_work_sync() and fix all callers. Perhaps this is not the best name, but "flush_work" is really bad. (akpm: this is why the earlier patches bypassed maintainers) Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Auke Kok <auke-jan.h.kok@intel.com>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:53 -07:00
Oleg Nesterov	a4798833d2	zap_other_threads: remove unneeded ->exit_signal change We already depend on fact that all sub-threads have ->exit_signal == -1, no need to set it in zap_other_threads(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:53 -07:00
Oleg Nesterov	85f4186af9	worker_thread: fix racy try_to_freeze() usage worker_thread() can miss freeze_process()->signal_wake_up() if it happens between try_to_freeze() and prepare_to_wait(). We should check freezing() before entering schedule(). This race was introduced by me in [PATCH 1/1] workqueue: don't migrate pending works from the dead CPU Looks like mm/vmscan.c:kswapd() has the same race. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:53 -07:00
Oleg Nesterov	b9aac8e0d3	worker_thread: don't play with signals worker_thread() doesn't need to "Block and flush all signals", this was already done by its caller, kthread(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:53 -07:00
Oleg Nesterov	23b2e5991a	workqueue: kill NOAUTOREL works We don't have any users, and it is not so trivial to use NOAUTOREL works correctly. It is better to simplify API. Delete NOAUTOREL support and rename work_release to work_clear_pending to avoid a confusion. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	1634c48f8b	make cancel_rearming_delayed_work() work on any workqueue, not just keventd_wq cancel_rearming_delayed_workqueue(wq, dwork) doesn't need the first parameter. We don't hang on un-queued dwork any longer, and work->data doesn't change its type. This means we can always figure out "wq" from dwork when it is needed. Remove this parameter, and rename the function to cancel_rearming_delayed_work(). Re-create an inline "obsolete" cancel_rearming_delayed_workqueue(wq) which just calls cancel_rearming_delayed_work(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	a848e3b67c	workqueue: introduce wq_per_cpu() helper Cleanup. A number of per_cpu_ptr(wq->cpu_wq, cpu) users have to check that cpu is valid for this wq. Make a simple helper. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	63bc036252	unify queue_delayed_work() and queue_delayed_work_on() Change queue_delayed_work() to use queue_delayed_work_on() to avoid the code duplication (saves 133 bytes). Q: queue_delayed_work() enqueues &dwork->work directly when delay == 0, why? [jirislaby@gmail.com: oops fix] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	ed7c0feede	make queue_delayed_work() friendly to flush_fork() Currently typeof(delayed_work->work.data) is "struct workqueue_struct" when the timer is pending "struct cpu_workqueue_struct" whe the work is queued This makes impossible to use flush_fork(delayed_work->work) in addition to cancel_delayed_work/cancel_rearming_delayed_work, not good. Change queue_delayed_work/delayed_work_timer_fn to use cwq, not wq. This complicates (and uglifies) these functions a little bit, but alows us to use flush_fork(dwork) and imho makes the whole code more consistent. Also, document the fact that cancel_rearming_delayed_work() doesn't garantee the completion of work->func() upon return. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	06ba38a9a0	workqueues: shift kthread_bind() from CPU_UP_PREPARE to CPU_ONLINE CPU_UP_PREPARE binds cwq->thread to the new CPU. So CPU_UP_CANCELED tries to wake up the task which is bound to the failed CPU. With this patch we don't bind cwq->thread until CPU becomes online. The first wake_up() after kthread_create() is a bit special, make a simple helper for that. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	c12920d190	workqueue: make init_workqueues() __init The only caller of init_workqueues() is do_basic_setup(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	cce1a1656c	workqueue: introduce workqueue_struct->singlethread Add explicit workqueue_struct->singlethread flag. This lessens .text a little, but most importantly this allows us to manipulate wq->list without changine the meaning of is_single_threaded(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	b1f4ec172f	workqueue: introduce cpu_singlethread_map The code like if (is_single_threaded(wq)) do_something(singlethread_cpu); else { for_each_cpu_mask(cpu, cpu_populated_map) do_something(cpu); } looks very annoying. We can add "static cpumask_t cpu_singlethread_map" and simplify the code. Lessens .text a bit, and imho makes the code more readable. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	dfb4b82e1c	workqueue: make cancel_rearming_delayed_workqueue() work on idle dwork cancel_rearming_delayed_workqueue(dwork) will hang forever if dwork was not scheduled, because in that case cancel_delayed_work()->del_timer_sync() never returns true. I don't know if there are any callers which may have problems, but this is not so convenient, and the fix is very simple. Q: looks like we don't need "struct workqueue_struct *wq" parameter. If the timer was aborted successfully, get_wq_data() == wq. Is it worth to add the new function? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	f293ea9200	workqueue: don't save interrupts in run_workqueue() work->func() may sleep, it's a bug to call run_workqueue() with irqs disabled. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	7097a87afe	workqueue: kill run_scheduled_work() Because it has no callers. Actually, I think the whole idea of run_scheduled_work() was not right, not good to mix "unqueue this work and execute its ->func()" in one function. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	3af24433ef	workqueue: don't migrate pending works from the dead CPU Currently CPU_DEAD uses kthread_stop() to stop cwq->thread and then transfers cwq->worklist to another CPU. However, it is very unlikely that worker_thread() will notice kthread_should_stop() before flushing cwq->worklist. It is only possible if worker_thread() was preempted after run_workqueue(cwq), a new work_struct was added, and CPU_DEAD happened before cwq->thread has a chance to run. This means that take_over_work() mostly adds unneeded complications. Note also that kthread_stop() is not good per se, wake_up_process() may confuse work->func() if it sleeps waiting for some event. Remove take_over_work() and migrate_sequence complications. CPU_DEAD sets the cwq->should_stop flag (introduced by this patch) and waits for cwq->thread to flush cwq->worklist and exit. Because the dead CPU is not on cpu_online_map, no more works can be added to that cwq. cpu_populated_map was introduced to optimize for_each_possible_cpu(), it is not strictly needed, and it is more a documentation in fact. Saves 418 bytes. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Gautham shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	36aa9dfc39	workqueue: don't clear cwq->thread until it exits Pointed out by Srivatsa Vaddagiri. cleanup_workqueue_thread() sets cwq->thread = NULL and does kthread_stop(). This breaks the "if (cwq->thread == current)" logic in flush_cpu_workqueue() and leads to deadlock. Kill the thead first, then clear cwq->thread. workqueue_mutex protects us from create_workqueue_thread() so we don't need cwq->lock. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Gautham shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	d721304dce	workqueue: fix flush_workqueue() vs CPU_DEAD race Many thanks to Srivatsa Vaddagiri for the helpful discussion and for spotting the bug in my previous attempt. work->func() (and thus flush_workqueue()) must not use workqueue_mutex, this leads to deadlock when CPU_DEAD does kthread_stop(). However without this mutex held we can't detect CPU_DEAD in progress, which can move pending works to another CPU while the dead one is not on cpu_online_map. Change flush_workqueue() to use for_each_possible_cpu(). This means that flush_cpu_workqueue() may hit CPU which is already dead. However in that case !list_empty(&cwq->worklist) \|\| cwq->current_work != NULL means that CPU_DEAD in progress, it will do kthread_stop() + take_over_work() so we can proceed and insert a barrier. We hold cwq->lock, so we are safe. Also, add migrate_sequence incremented by take_over_work() under cwq->lock. If take_over_work() happened before we checked this CPU, we should see the new value after spin_unlock(). Further possible changes: remove CPU_DEAD handling (along with take_over_work, migrate_sequence) from workqueue.c. CPU_DEAD just sets cwq->please_exit_after_flush flag. CPU_UP_PREPARE->create_workqueue_thread() clears this flag, and creates the new thread if cwq->thread == NULL. This way the workqueue/cpu-hotplug interaction is almost zero, workqueue_mutex just protects "workqueues" list, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE go away. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Gautham shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:52 -07:00
Oleg Nesterov	319c2a986e	workqueue: fix freezeable workqueues implementation Currently ->freezeable is per-cpu, this is wrong. CPU_UP_PREPARE creates cwq->thread which is not freezeable. Move ->freezeable to workqueue_struct. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Gautham shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:51 -07:00
Heiko Carstens	e7407dcc69	call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failed This makes cpu hotplug symmetrical: if CPU_UP_PREPARE fails we get CPU_UP_CANCELED, so we can undo what ever happened on PREPARE. The same should happen for CPU_DOWN_PREPARE. [akpm@linux-foundation.org: fix for reduce-size-of-task_struct-on-64-bit-machines] Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Gautham Shenoy <ego@in.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:51 -07:00
Gautham R Shenoy	5be9361cdf	Eliminate lock_cpu_hotplug in kernel/schedc Eliminate lock_cpu_hotplug from kernel/sched.c and use sched_hotcpu_mutex instead to postpone a hotplug event. In the migration_call hotcpu callback function, take sched_hotcpu_mutex while handling the event CPU_LOCK_ACQUIRE and release it while handling CPU_LOCK_RELEASE event. [akpm@linux-foundation.org: fix deadlock] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:51 -07:00
Gautham R Shenoy	baaca49f41	Define and use new events,CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE This is an attempt to provide an alternate mechanism for postponing a hotplug event instead of using a global mechanism like lock_cpu_hotplug. The proposal is to add two new events namely CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE. The notification for these two events would be sent out before and after a cpu_hotplug event respectively. During the CPU_LOCK_ACQUIRE event, a cpu-hotplug-aware subsystem is supposed to acquire any per-subsystem hotcpu mutex ( Eg. workqueue_mutex in kernel/workqueue.c ). During the CPU_LOCK_RELEASE release event the cpu-hotplug-aware subsystem is supposed to release the per-subsystem hotcpu mutex. The reasons for defining new events as opposed to reusing the existing events like CPU_UP_PREPARE/CPU_UP_FAILED/CPU_ONLINE for locking/unlocking of per-subsystem hotcpu mutexes are as follow: - CPU_LOCK_ACQUIRE: All hotcpu mutexes are taken before subsystems start handling pre-hotplug events like CPU_UP_PREPARE/CPU_DOWN_PREPARE etc, thus ensuring a clean handling of these events. - CPU_LOCK_RELEASE: The hotcpu mutexes will be released only after all subsystems have handled post-hotplug events like CPU_DOWN_FAILED, CPU_DEAD,CPU_ONLINE etc thereby ensuring that there are no subsequent clashes amongst the interdependent subsystems after a cpu hotplugs. This patch also uses __raw_notifier_call chain in _cpu_up to take care of the dependency between the two consequetive calls to raw_notifier_call_chain. [akpm@linux-foundation.org: fix a bug] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:51 -07:00
Gautham R Shenoy	6f7cc11aa6	Extend notifier_call_chain to count nr_calls made Since 2.6.18-something, the community has been bugged by the problem to provide a clean and a stable mechanism to postpone a cpu-hotplug event as lock_cpu_hotplug was badly broken. This is another proposal towards solving that problem. This one is along the lines of the solution provided in kernel/workqueue.c Instead of having a global mechanism like lock_cpu_hotplug, we allow the subsytems to define their own per-subsystem hot cpu mutexes. These would be taken(released) where ever we are currently calling lock_cpu_hotplug(unlock_cpu_hotplug). Also, in the per-subsystem hotcpu callback function,we take this mutex before we handle any pre-cpu-hotplug events and release it once we finish handling the post-cpu-hotplug events. A standard means for doing this has been provided in [PATCH 2/4] and demonstrated in [PATCH 3/4]. The ordering of these per-subsystem mutexes might still prove to be a problem, but hopefully lockdep should help us get out of that muddle. The patch set to be applied against linux-2.6.19-rc5 is as follows: [PATCH 1/4] : Extend notifier_call_chain with an option to specify the number of notifications to be sent and also count the number of notifications actually sent. [PATCH 2/4] : Define events CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE and send out notifications for these in _cpu_up and _cpu_down. This would help us standardise the acquire and release of the subsystem locks in the hotcpu callback functions of these subsystems. [PATCH 3/4] : Eliminate lock_cpu_hotplug from kernel/sched.c. [PATCH 4/4] : In workqueue_cpu_callback function, acquire(release) the workqueue_mutex while handling CPU_LOCK_ACQUIRE(CPU_LOCK_RELEASE). If the per-subsystem-locking approach survives the test of time, we can expect a slow phasing out of lock_cpu_hotplug, which has not yet been eliminated in these patches :) This patch: Provide notifier_call_chain with an option to call only a specified number of notifiers and also record the number of call to notifiers made. The need for this enhancement was identified in the post entitled "Slab - Eliminate lock_cpu_hotplug from slab" (http://lkml.org/lkml/2006/10/28/92) by Ravikiran G Thirumalai and Andrew Morton. This patch adds two additional parameters to notifier_call_chain API namely - int nr_to_calls : Number of notifier_functions to be called. The don't care value is -1. - unsigned int *nr_calls : Records the total number of notifier_funtions called by notifier_call_chain. The don't care value is NULL. [michal.k.k.piotrowski@gmail.com: build fix] Credit: Andrew Morton <akpm@osdl.org> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:51 -07:00
Tom Zanussi	7c9cb38302	relay: use plain timer instead of delayed work relay doesn't need to use schedule_delayed_work() for waking readers when a simple timer will do. Signed-off-by: Tom Zanussi <zanussi@comcast.net> Cc: Satyam Sharma <satyam.sharma@gmail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:51 -07:00
Oleg Nesterov	83c22520c5	flush_cpu_workqueue: don't flush an empty ->worklist Now when we have ->current_work we can avoid adding a barrier and waiting for its completition when cwq's queue is empty. Note: this change is also useful if we change flush_workqueue() to also check the dead CPUs. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Gautham Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:51 -07:00
Andrew Morton	edab2516a6	flush_workqueue(): use preempt_disable to hold off cpu hotplug Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Gautham Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:51 -07:00
Oleg Nesterov	b89deed32c	implement flush_work() A basic problem with flush_scheduled_work() is that it blocks behind _all_ presently-queued works, rather than just the work whcih the caller wants to flush. If the caller holds some lock, and if one of the queued work happens to want that lock as well then accidental deadlocks can occur. One example of this is the phy layer: it wants to flush work while holding rtnl_lock(). But if a linkwatch event happens to be queued, the phy code will deadlock because the linkwatch callback function takes rtnl_lock. So we implement a new function which will flush a single work - just the one which the caller wants to free up. Thus we avoid the accidental deadlocks which can arise from unrelated subsystems' callbacks taking shared locks. flush_work() non-blockingly dequeues the work_struct which we want to kill, then it waits for its handler to complete on all CPUs. Add ->current_work to the "struct cpu_workqueue_struct", it points to currently running "struct work_struct". When flush_work(work) detects ->current_work == work, it inserts a barrier at the _head_ of ->worklist (and thus right _after_ that work) and waits for completition. This means that the next work fired on that CPU will be this barrier, or another barrier queued by concurrent flush_work(), so the caller of flush_work() will be woken before any "regular" work has a chance to run. When wait_on_work() unlocks workqueue_mutex (or whatever we choose to protect against CPU hotplug), CPU may go away. But in that case take_over_work() will move a barrier we queued to another CPU, it will be fired sometime, and wait_on_work() will be woken. Actually, we are doing cleanup_workqueue_thread()->kthread_stop() before take_over_work(), so cwq->thread should complete its ->worklist (and thus the barrier), because currently we don't check kthread_should_stop() in run_workqueue(). But even if we did, everything should be ok. [akpm@osdl.org: cleanup] [akpm@osdl.org: add flush_work_keventd() wrapper] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:50 -07:00
Oleg Nesterov	fc2e4d7041	reimplement flush_workqueue() Remove ->remove_sequence, ->insert_sequence, and ->work_done from struct cpu_workqueue_struct. To implement flush_workqueue() we can queue a barrier work on each CPU and wait for its completition. The barrier is queued under workqueue_mutex to ensure that per cpu wq->cpu_wq is alive, we drop this mutex before going to sleep. If CPU goes down while we are waiting for completition, take_over_work() will move the barrier on another CPU, and the handler will wake up us eventually. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:50 -07:00
Andrew Morton	e18f3ffb9c	schedule_on_each_cpu(): use preempt_disable() We take workqueue_mutex in there to keep CPU hotplug away. But preempt_disable() will suffice for that. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:50 -07:00
David Miller	9b04bd2756	Fix printk format warnings in timer_list.c u64 and s64 are not necessarily 'long long' on some 64-bit platforms, so explicit the type to kill the compiler warnings. Also consistently use '%Lu' which is unsigned. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:50 -07:00
Daniel Walker	35c35d1afa	clocksource: spelling error in watchdog code There's more that need fixing, and fix my own subject spelling error too. Signed-off-by: Daniel Walker <dwalker@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:49 -07:00
Roland McGrath	55c0d1f83e	Move sig_kernel_* et al macros to linux/signal.h This patch moves the sig_kernel_* and related macros from kernel/signal.c to linux/signal.h, and cleans them up slightly. I need the sig_kernel_* macros for default signal behavior in the utrace code, and want to avoid duplication or overhead to share the knowledge. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:49 -07:00
Akinobu Mita	85badbdf51	use simple_read_from_buffer in kernel/ Cleanup using simple_read_from_buffer() for /dev/cpuset/tasks and /proc/config.gz. Cc: Paul Jackson <pj@sgi.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:49 -07:00
Jeff Dike	cb0c78cc94	Fix Linuxdoc comment A linuxdoc comment had fallen out of date - it refers to an argument which no longer exists. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:48 -07:00
Rafael J. Wysocki	a3d25c275d	PM: Separate hibernation code from suspend code [ With Johannes Berg <johannes@sipsolutions.net> ] Separate the hibernation (aka suspend to disk code) from the other suspend code. In particular: * Remove the definitions related to hibernation from include/linux/pm.h * Introduce struct hibernation_ops and a new hibernate() function to hibernate the system, defined in include/linux/suspend.h * Separate suspend code in kernel/power/main.c from hibernation-related code in kernel/power/disk.c and kernel/power/user.c (with the help of hibernation_ops) * Switch ACPI (the only user of pm_ops.pm_disk_mode) to hibernation_ops Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:48 -07:00
Andrew Morton	d60846c4d1	swsusp: clean up printk Remove an inexplicable / Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:48 -07:00
John Anthony Kazos Jr	f42df9e658	general: convert "kernel" subdirectory to UTF-8 Convert the "kernel" subdirectory of the tree to UTF-8. The only file modified is <kernel/sys.c>. Signed-off-by: John Anthony Kazos Jr. <jakj@j-a-k-j.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 08:58:20 +02:00
Michael Opdenacker	59c51591a0	Fix occurrences of "the the " Signed-off-by: Michael Opdenacker <michael@free-electrons.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 08:57:56 +02:00
Johannes Berg	940d67f6b9	[POWERPC] swsusp: Introduce register_nosave_region_late This patch introduces a new register_nosave_region_late function that can be called from initcalls when register_nosave_region can no longer be used because it uses bootmem. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-09 16:34:56 +10:00
Robert P. J. Day	02a3e59a08	Fix minor typoes in kernel/module.c Fix minor (comment) typoes in kernel/module.c. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 07:26:28 +02:00
David Sterba	3dde6ad8fc	Fix trivial typos in Kconfig* files Fix several typos in help text in Kconfig* files. Signed-off-by: David Sterba <dave@jikos.cz> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 07:12:20 +02:00
Andrew Morton	d5f9f942c6	revert 'sched: redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array' Revert commit `bd53f96ca5`. Con says: This is no good, sorry. The one I saw originally was with the staircase deadline cpu scheduler in situ and was different. #define TASK_PREEMPTS_CURR(p, rq) \ ((p)->prio < (rq)->curr->prio) (((p)->prio < (rq)->curr->prio) && ((p)->array == (rq)->active)) This will fail to wake up a runqueue for a task that has been migrated to the expired array of a runqueue which is otherwise idle which can happen with smp balancing, Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: Con Kolivas <kernel@kolivas.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 20:41:15 -07:00
Linus Torvalds	df6d3916f3	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (77 commits) [POWERPC] Abolish powerpc_flash_init() [POWERPC] Early serial debug support for PPC44x [POWERPC] Support for the Ebony 440GP reference board in arch/powerpc [POWERPC] Add device tree for Ebony [POWERPC] Add powerpc/platforms/44x, disable platforms/4xx for now [POWERPC] MPIC U3/U4 MSI backend [POWERPC] MPIC MSI allocator [POWERPC] Enable MSI mappings for MPIC [POWERPC] Tell Phyp we support MSI [POWERPC] RTAS MSI implementation [POWERPC] PowerPC MSI infrastructure [POWERPC] Rip out the existing powerpc msi stubs [POWERPC] Remove use of 4level-fixup.h for ppc32 [POWERPC] Add powerpc PCI-E reset API implementation [POWERPC] Holly bootwrapper [POWERPC] Holly DTS [POWERPC] Holly defconfig [POWERPC] Add support for 750CL Holly board [POWERPC] Generalize tsi108 PCI setup [POWERPC] Generalize tsi108 PHY types ... Fixed conflict in include/asm-powerpc/kdebug.h manually Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:50:19 -07:00
Bernhard Walle	d85a60d85e	Add IRQF_IRQPOLL flag (common code) irqpoll is broken on some architectures that don't use the IRQ 0 for the timer interrupt like IA64. This patch adds a IRQF_IRQPOLL flag. Each architecture is handled in a separate pach. As I left the irq == 0 as condition, this should not break existing architectures that use timer_irq == 0 and that I did't address with that patch (because I don't know). This patch: This patch adds a IRQF_IRQPOLL flag that the interrupt registration code could use for the interrupt it wants to use for IRQ polling. Because this must not be the timer interrupt, an additional flag was added instead of re-using the IRQF_TIMER constant. Until all architectures will have an IRQF_IRQPOLL interrupt, irq == 0 will stay as alternative as it should not break anything. Also, note_interrupt() is called on CPU-specific interrupts to be used as interrupt source for IRQ polling. Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: Grant Grundler <grundler@google.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:22 -07:00
Ananth N Mavinakayanahalli	bf8f6e5b3e	Kprobes: The ON/OFF knob thru debugfs This patch provides a debugfs knob to turn kprobes on/off o A new file /debug/kprobes/enabled indicates if kprobes is enabled or not (default enabled) o Echoing 0 to this file will disarm all installed probes o Any new probe registration when disabled will register the probe but not arm it. A message will be printed out in such a case. o When a value 1 is echoed to the file, all probes (including ones registered in the intervening period) will be enabled o Unregistration will happen irrespective of whether probes are globally enabled or not. o Update Documentation/kprobes.txt to reflect these changes. While there also update the doc to make it current. We are also looking at providing sysrq key support to tie to the disabling feature provided by this patch. [akpm@linux-foundation.org: Use bool like a bool!] [akpm@linux-foundation.org: add printk facility levels] [cornelia.huck@de.ibm.com: Add the missing arch_trampoline_kprobe() for s390] Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:19 -07:00
Christoph Hellwig	4c4308cb93	kprobes: kretprobes simplifications - consolidate duplicate code in all arch_prepare_kretprobe instances into common code - replace various odd helpers that use hlist_for_each_entry to get the first elemenet of a list with either a hlist_for_each_entry_save or an opencoded access to the first element in the caller - inline add_rp_inst into it's only remaining caller - use kretprobe_inst_table_head instead of opencoding it Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:19 -07:00
Christoph Hellwig	6f716acd5f	kprobes: codingstyle cleanups Remove superflous braces and fix indentation aswell as comments. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:19 -07:00
Christoph Hellwig	b0bb501651	kprobes: use hlist_for_each_entry Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:19 -07:00
Josh Triplett	ade5fb818f	rcutorture: Remove redundant assignment to cur_ops in for loop The for loop in rcutorture_init uses the condition cur_ops = torture_ops[i], cur_ops but then makes the same assignment to cur_ops inside the loop. Remove the redundant assignment inside the loop, and remove now-unnecessary braces. Signed-off-by: Josh Triplett <josh@kernel.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:17 -07:00
Josh Triplett	c8e5b16310	rcutorture: style cleanup: avoid != NULL in boolean tests Signed-off-by: Josh Triplett <josh@kernel.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:17 -07:00
Ahmed S. Darwish	788e770eb2	rcutorture: Use ARRAY_SIZE macro when appropriate Use ARRAY_SIZE macro already defined in kernel.h Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:17 -07:00
Siddha, Suresh B	c3396620ca	sched: align rq to cacheline boundary Align the per cpu runqueue to the cacheline boundary. This will minimize the number of cachelines touched during remote wakeup. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Ravikiran G Thirumalai <kiran@scalex86.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:17 -07:00
Dmitry Adamushko	bd53f96ca5	sched: redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array - Make TASK_PREEMPTS_CURR(task, rq) return "true" only if the task's prio is higher than the current's one and the task is in the "active" array. This ensures we don't make redundant resched_task() calls when the task is in the "expired" array (as may happen now in set_user_prio(), rt_mutex_setprio() and pull_task() ) ; - generalise conditions for a call to resched_task() in set_user_nice(), rt_mutex_setprio() and sched_setscheduler() Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: Con Kolivas <kernel@kolivas.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:17 -07:00
Siddha, Suresh B	4953198b6c	sched: optimize siblings status check logic in wake_idle() When a logical cpu 'x' already has more than one process running, then most likely the siblings of that cpu 'x' must be busy. Otherwise the idle siblings would have likely(in most of the scenarios) picked up the extra load making the load on 'x' atmost one. Use this logic to eliminate the siblings status check and minimize the cache misses encountered on a heavily loaded system. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:17 -07:00
Eric Dumazet	5517d86bea	Speed up divides by cpu_power in scheduler I noticed expensive divides done in try_to_wakeup() and find_busiest_group() on a bi dual core Opteron machine (total of 4 cores), moderatly loaded (15.000 context switch per second) oprofile numbers : CPU: AMD64 processors, speed 2600.05 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 50000 samples % symbol name ... 613914 1.0498 try_to_wake_up 834 0.0013 :ffffffff80227ae1: div %rcx 77513 0.1191 :ffffffff80227ae4: mov %rax,%r11 608893 1.0413 find_busiest_group 1841 0.0031 :ffffffff802260bf: div %rdi 140109 0.2394 :ffffffff802260c2: test %sil,%sil Some of these divides can use the reciprocal divides we introduced some time ago (currently used in slab AFAIK) We can assume a load will fit in a 32bits number, because with a SCHED_LOAD_SCALE=128 value, its still a theorical limit of 33554432 When/if we reach this limit one day, probably cpus will have a fast hardware divide and we can zap the reciprocal divide trick. Ingo suggested to rename cpu_power to __cpu_power to make clear it should not be modified without changing its reciprocal value too. I did not convert the divide in cpu_avg_load_per_task(), because tracking nr_running changes may be not worth it ? We could use a static table of 32 reciprocal values but it would add a conditional branch and table lookup. [akpm@linux-foundation.org: !SMP build fix] Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:17 -07:00
Siddha, Suresh B	46cb4b7c88	sched: dynticks idle load balancing Fix the process idle load balancing in the presence of dynticks. cpus for which ticks are stopped will sleep till the next event wakes it up. Potentially these sleeps can be for large durations and during which today, there is no periodic idle load balancing being done. This patch nominates an owner among the idle cpus, which does the idle load balancing on behalf of the other idle cpus. And once all the cpus are completely idle, then we can stop this idle load balancing too. Checks added in fast path are minimized. Whenever there are busy cpus in the system, there will be an owner(idle cpu) doing the system wide idle load balancing. Open items: 1. Intelligent owner selection (like an idle core in a busy package). 2. Merge with rcu's nohz_cpu_mask? Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:17 -07:00
Siddha, Suresh B	bdecea3a92	sched: fix idle load balancing in softirqd context Periodic load balancing in recent kernels happen in the softirq. In certain -rt configurations, these softirqs are handled in softirqd context. And hence the check for idle processor was always returning busy (as nr_running > 1). This patch captures the idle information at the tick and passes this info to softirq context through an element 'idle_at_tick' in rq. [kernel@kolivas.org: Fix reverse idle at tick logic] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:17 -07:00
Stas Sergeev	6bdb6b620e	export hrtimer_forward Other symbols of the hrtimers API are already exported. Signed-off-by: Stas Sergeev <stsp@aknet.ru> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:15 -07:00
David Rientjes	6f7f02e78a	cpusets: allow empty {cpus,mems}_allowed to be set for unpopulated cpuset You currently cannot remove all cpus or mems from cpus_allowed or mems_allowed of a cpuset. We now allow both if there are no attached tasks. Acked-by: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Paul Menage <menage@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:14 -07:00
Jarek Poplawski	4ff773bbde	lockdep: removed unused ip argument in mark_lock & mark_held_locks It looks like a remainder from designing... Signed-off-by: Jarek Poplawski <jarkao@o2.pl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:13 -07:00
Adrian Bunk	35bab756b4	The scheduled -EINVAL for invalid timevals in setitimer As scheduled, do_setitimer() now returns -EINVAL for invalid timeval. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:13 -07:00
Tom Alsberg	9926e4c743	CPU time limit patch / setrlimit(RLIMIT_CPU, 0) cheat fix As discovered here today, the change in Kernel 2.6.17 intended to inhibit users from setting RLIMIT_CPU to 0 (as that is equivalent to unlimited) by "cheating" and setting it to 1 in such a case, does not make a difference, as the check is done in the wrong place (too late), and only applies to the profiling code. On all systems I checked running kernels above 2.6.17, no matter what the hard and soft CPU time limits were before, a user could escape them by issuing in the shell (sh/bash/zsh) "ulimit -t 0", and then the user's process was not ever killed. Attached is a trivial patch to fix that. Simply moving the check to a slightly earlier location (specifically, before the line that actually assigns the limit - *old_rlim = new_rlim), does the trick. Do note that at least the zsh (but not ash, dash, or bash) shell has the problem of "caching" the limits set by the ulimit command, so when running zsh the fix will not immediately be evident - after entering "ulimit -t 0", "ulimit -a" will show "-t: cpu time (seconds) 0", even though the actual limit as returned by getrlimit(...) will be 1. It can be verified by opening a subshell (which will not have the values of the parent shell in cache) and checking in it, or just by running a CPU intensive command like "echo '65536^1048576' \| bc" and verifying that it dumps core after one second. Regardless of whether that is a misfeature in the shell, perhaps it would be better to return -EINVAL from setrlimit in such a case instead of cheating and setting to 1, as that does not really reflect the actual state of the process anymore. I do not however know what the ground for that decision was in the original 2.6.17 change, and whether there would be any "backward" compatibility issues, so I preferred not to touch that right now. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:12 -07:00
Pavel Emelianov	b5e618181a	Introduce a handy list_first_entry macro There are many places in the kernel where the construction like foo = list_entry(head->next, struct foo_struct, list); are used. The code might look more descriptive and neat if using the macro list_first_entry(head, type, member) \ list_entry((head)->next, type, member) Here is the macro itself and the examples of its usage in the generic code. If it will turn out to be useful, I can prepare the set of patches to inject in into arch-specific code, drivers, networking, etc. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:11 -07:00
Jarek Poplawski	9e860d000a	lockdep: lookup_chain_cache comment errata Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:11 -07:00
Thomas Gleixner	d3ed782458	highres/dyntick: prevent xtime lock contention While the !highres/!dyntick code assigns the duty of the do_timer() call to one specific CPU, this was dropped in the highres/dyntick part during development. Steven Rostedt discovered the xtime lock contention on highres/dyntick due to several CPUs trying to update jiffies. Add the single CPU assignement back. In the dyntick case this needs to be handled carefully, as the CPU which has the do_timer() duty must drop the assignement and let it be grabbed by another CPU, which is active. Otherwise the do_timer() calls would not happen during the long sleep. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Acked-by: Mark Lord <mlord@pobox.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:10 -07:00
Martin Peschke	5a0c6a0d1a	kallsyms: cleanup: use seq_release_private() where appropriate We can save some lines of code by using seq_release_private(). Signed-off-by: Martin Peschke <mp3@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:09 -07:00
Robert P. J. Day	039b6b3ed8	audit: add spaces on either side of case "..." operator. Following the programming advice laid down in the gcc manual, make sure the case "..." operator has spaces on either side. According to: http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Case-Ranges.html#Case-Ranges: "Be careful: Write spaces around the ..., for otherwise it may be parsed wrong when you use it with integer values." Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:09 -07:00
Ravikiran G Thirumalai	e729aa16b1	Pad irq_desc to internode cacheline size We noticed a drop in n/w performance due to the irq_desc being cacheline aligned rather than internode aligned. We see 50% of expected performance when two e1000 nics local to two different nodes have consecutive irq descriptors allocated, due to false sharing. Note that this patch does away with cacheline padding for the UP case, as it does not seem useful for UP configurations. Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:09 -07:00
Pavel Emelianov	428e6ce023	Lockdep treats down_write_trylock like regular down_write This causes constructions like down_write(&mm1->mmap_sem); if (down_write_trylock(&mm2->mmap_sem)) { ... up_write(&mm2->mmap_sem); } up_write(&mm1->mmap_sem); generate a lockdep warning about circular locking dependence. Call rwsem_acquire() with trylock set to 1. Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:09 -07:00
Bert Wesarg	9730b5b06f	kernel/params.c: fix lying comment for param_array() This fixes the comment for the function param_array. Which lies that it only temporarily mangle the input string @val. Signed-off-by: Bert Wesarg <wesarg@informatik.uni-halle.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:08 -07:00
Alexey Dobriyan	a5c43dae7a	Fix race between cat /proc/slab_allocators and rmmod Same story as with cat /proc/*/wchan race vs rmmod race, only /proc/slab_allocators want more info than just symbol name. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:08 -07:00
Alexey Dobriyan	9d65cb4a17	Fix race between cat /proc//wchan and rmmod et al kallsyms_lookup() can go iterating over modules list unprotected which is OK for emergency situations (oops), but not OK for regular stuff like /proc//wchan. Introduce lookup_symbol_name()/lookup_module_symbol_name() which copy symbol name into caller-supplied buffer or return -ERANGE. All copying is done with module_mutex held, so... Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:08 -07:00
Alexey Dobriyan	ffb4512276	Simplify kallsyms_lookup() Several kallsyms_lookup() pass dummy arguments but only need, say, module's name. Make kallsyms_lookup() accept NULLs where possible. Also, makes picture clearer about what interfaces are needed for all symbol resolving business. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:08 -07:00
Alexey Dobriyan	ea07890a68	Fix race between rmmod and cat /proc/kallsyms module_get_kallsym() leaks "struct module *" outside of module_mutex which is no-no, because module can dissapear right after mutex unlock. Copy all needed information from inside module_mutex into caller-supplied space. [bunk@stusta.de: is_exported() can now become static] Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:08 -07:00
Alexey Dobriyan	ae84e32470	Simplify module_get_kallsym() by dropping length arg module_get_kallsym() could in theory truncate module symbol name to fit in buffer, but nobody does this. Always use KSYM_NAME_LEN + 1 bytes for name. Suggested by lg^WRusty. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:08 -07:00
Jan Engelhardt	b73a7e76c1	Fix kevent's childs priority greediness Fix kevent's childs priority greediness. Such tasks were always scheduled at nice level -5 and, at that time, udev stole us the CPU time with -5. Already posted at http://lkml.org/lkml/2005/1/10/85 [akpm@linux-foundation.org: add comment] Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:07 -07:00
Simon Horman	6672f76a5a	kdump/kexec: calculate note size at compile time Currently the size of the per-cpu region reserved to save crash notes is set by the per-architecture value MAX_NOTE_BYTES. Which in turn is currently set to 1024 on all supported architectures. While testing ia64 I recently discovered that this value is in fact too small. The particular setup I was using actually needs 1172 bytes. This lead to very tedious failure mode where the tail of one elf note would overwrite the head of another if they ended up being alocated sequentially by kmalloc, which was often the case. It seems to me that a far better approach is to caclculate the size that the area needs to be. This patch does just that. If a simpler stop-gap patch for ia64 to be squeezed into 2.6.21(.X) is needed then this should be as easy as making MAX_NOTE_BYTES larger in arch/asm-ia64/kexec.h. Perhaps 2048 would be a good choice. However, I think that the approach in this patch is a much more robust idea. Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:07 -07:00
Randy Dunlap	e63340ae6b	header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:07 -07:00
Jeremy Fitzhardinge	04c9167f91	add touch_all_softlockup_watchdogs() Add touch_all_softlockup_watchdogs() to allow the softlockup watchdog timers on all cpus to be updated. This is used to prevent sysrq-t from generating a spurious watchdog message when generating lots of output. Softlockup watchdogs use sched_clock() as its timebase, which is inherently per-cpu (at least, when it is measuring unstolen time). Because of this, it isn't possible for one CPU to directly update the other CPU's timers, but it is possible to tell the other CPUs to do update themselves appropriately. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Acked-by: Chris Lalancette <clalance@redhat.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:06 -07:00
Jeremy Fitzhardinge	966812dc98	Ignore stolen time in the softlockup watchdog The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:06 -07:00
john stultz	8524070b79	Move timekeeping code to timekeeping.c Move the timekeeping code out of kernel/timer.c and into kernel/time/timekeeping.c. I made no cleanups or other changes in transit. [akpm@linux-foundation.org: build fix] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:06 -07:00
Ahmed S. Darwish	f75d222b83	IRQ: check for PERCPU flag only when adding first irqaction An irqaction structure won't be added to an IRQ descriptor irqaction list if it doesn't agree with other irqactions on the IRQF_PERCPU flag. Don't check for this flag to change IRQ descriptor `status' for every irqaction added to the list, Doing the check only for the first irqaction added is enough. Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:06 -07:00
Venki Pallipadi	6e453a6751	Add support for deferrable timers Introduce a new flag for timers - deferrable: Timers that work normally when system is busy. But, will not cause CPU to come out of idle (just to service this timer), when CPU is idle. Instead, this timer will be serviced when CPU eventually wakes up with a subsequent non-deferrable timer. The main advantage of this is to avoid unnecessary timer interrupts when CPU is idle. If the routine currently called by a timer can wait until next event without any issues, this new timer can be used to setup timer event for that routine. This, with dynticks, allows CPUs to be lazy, allowing them to stay in idle for extended period of time by reducing unnecesary wakeup and thereby reducing the power consumption. This patch: Builds this new timer on top of existing timer infrastructure. It uses last bit in 'base' pointer of timer_list structure to store this deferrable timer flag. __next_timer_interrupt() function skips over these deferrable timers when CPU looks for next timer event for which it has to wake up. This is exported by a new interface init_timer_deferrable() that can be called in place of regular init_timer(). [akpm@linux-foundation.org: Privatise a #define] Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:05 -07:00
Dmitry Adamushko	d2d9433a4c	kernel/irq/proc.c: unprotected iteration over the IRQ action list in name_unique() setup_irq() releases a desc->lock before calling register_handler_proc(), so the iteration over the IRQ action list is not protected. (akpm: the check itself is still racy, but at least it probably won't oops now). Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:05 -07:00
Srivatsa Vaddagiri	dd9037a26a	Fix race between attach_task and cpuset_exit Currently cpuset_exit() changes the exiting task's ->cpuset pointer w/o taking task_lock(). This can lead to ugly races between attach_task and cpuset_exit. Details of the races are described at http://lkml.org/lkml/2007/3/24/132. Patch below closes those races. Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:05 -07:00
Christoph Hellwig	1eeb66a1bb	move die notifier handling to common code This patch moves the die notifier handling to common code. Previous various architectures had exactly the same code for it. Note that the new code is compiled unconditionally, this should be understood as an appel to the other architecture maintainer to implement support for it aswell (aka sprinkling a notify_die or two in the proper place) arm had a notifiy_die that did something totally different, I renamed it to arm_notify_die as part of the patch and made it static to the file it's declared and used at. avr32 used to pass slightly less information through this interface and I brought it into line with the other architectures. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix vmalloc_sync_all bustage] [bryan.wu@analog.com: fix vmalloc_sync_all in nommu] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:04 -07:00
Randy Dunlap	e386979299	kprobes: fix sparse NULL warning Fix sparse NULL warnings: kernel/kprobes.c:915:49: warning: Using plain integer as NULL pointer Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:04 -07:00
Gerd Hoffmann	69331af79c	Fixes and cleanups for earlyprintk aka boot console The console subsystem already has an idea of a boot console, using the CON_BOOT flag. The implementation has some flaws though. The major problem is that presence of a boot console makes register_console() ignore any other console devices (unless explicitly specified on the kernel command line). This patch fixes the console selection code to not consider a boot console a full-featured one, so the first non-boot console registering will become the default console instead. This way the unregister call for the boot console in the register_console() function actually triggers and the handover from the boot console to the real console device works smoothly. Added a printk for the handover, so you know which console device the output goes to when the boot console stops printing messages. The disable_early_printk() call is obsolete with that patch, explicitly disabling the early console isn't needed any more as it works automagically with that patch. I've walked through the tree, dropped all disable_early_printk() instances found below arch/ and tagged the consoles with CON_BOOT if needed. The code is tested on x86, sh (thanks to Paul) and mips (thanks to Ralf). Changes to last version: Rediffed against -rc3, adapted to mips cleanups by Ralf, fixed "udbg-immortal" cmd line arg on powerpc. Signed-off-by: Gerd Hoffmann <kraxel@exsuse.de> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Andi Kleen <ak@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:04 -07:00
Nick Piggin	72c1bbf308	futex: restartable futex_wait LTP test sigaction_16_24 fails, because it expects sem_wait to be restarted if SA_RESTART is set. sem_wait is implemented with futex_wait, that currently doesn't support being restarted. Ulrich confirms that the call should be restartable. Implement a restart_block method to handle the relative timeout, and allow restarts. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:03 -07:00
Rusty Russell	9adef58b1d	futex: get_futex_key, get_key_refs and drop_key_refs lguest uses the convenient futex infrastructure for inter-domain I/O, so expose get_futex_key, get_key_refs (renamed get_futex_key_refs) and drop_key_refs (renamed drop_futex_key_refs). Also means we need to expose the union that these use. No code changes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:03 -07:00
Kees Cook	5096add84b	proc: maps protection The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive information about the memory location and usage of processes. Issues: - maps should not be world-readable, especially if programs expect any kind of ASLR protection from local attackers. - maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc check the maps when %n is in a *printf call, and a setuid(getuid()) process wouldn't be able to read its own maps file. (For reference see http://lkml.org/lkml/2006/1/22/150) - a system-wide toggle is needed to allow prior behavior in the case of non-root applications that depend on access to the maps contents. This change implements a check using "ptrace_may_attach" before allowing access to read the maps contents. To control this protection, the new knob /proc/sys/kernel/maps_protect has been added, with corresponding updates to the procfs documentation. [akpm@linux-foundation.org: build fixes] [akpm@linux-foundation.org: New sysctl numbers are old hat] Signed-off-by: Kees Cook <kees@outflux.net> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:02 -07:00
Eric Dumazet	753e9c5cd9	Optimize timespec_trunc() The first thing done by timespec_trunc() is : if (gran <= jiffies_to_usecs(1) * 1000) This should really be a test against a constant known at compile time. Alas, it isnt. jiffies_to_usec() was unilined so C compiler emits a function call and a multiply to compute : a CONSTANT. mov $0x1,%edi mov %rbx,0xffffffffffffffe8(%rbp) mov %r12,0xfffffffffffffff0(%rbp) mov %edx,%ebx mov %rsi,0xffffffffffffffc8(%rbp) mov %rsi,%r12 callq ffffffff80232010 <jiffies_to_usecs> imul $0x3e8,%eax,%eax cmp %ebx,%eax This patch reorders kernel/time.c a bit so that jiffies_to_usecs() is defined before timespec_trunc() so that compiler now generates : cmp $0x3d0900,%edx (HZ=250 on my machine) This gives a better code (timespec_trunc() becoming a leaf function), and shorter kernel size as well. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:01 -07:00
Josh Triplett	6f8bc500a1	rcutorture: Mark rcu_torture_init as __init The corresponding rcu_torture_cleanup cannot get marked as __exit, because rcu_torture_init uses it to clean up if init fails. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:00 -07:00
Badari Pulavarty	e3222c4ecc	Merge sys_clone()/sys_unshare() nsproxy and namespace handling sys_clone() and sys_unshare() both makes copies of nsproxy and its associated namespaces. But they have different code paths. This patch merges all the nsproxy and its associated namespace copy/clone handling (as much as possible). Posted on container list earlier for feedback. - Create a new nsproxy and its associated namespaces and pass it back to caller to attach it to right process. - Changed all copy__ns() routines to return a new copy of namespace instead of attaching it to task->nsproxy. - Moved the CAP_SYS_ADMIN checks out of copy__ns() routines. - Removed unnessary !ns checks from copy__ns() and added BUG_ON() just incase. - Get rid of all individual unshare__ns() routines and make use of copy_*_ns() instead. [akpm@osdl.org: cleanups, warning fix] [clg@fr.ibm.com: remove dup_namespaces() declaration] [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval] [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <containers@lists.osdl.org> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:00 -07:00
Prarit Bhargava	ee527cd3a2	Use stop_machine_run in the Intel RNG driver Replace call_smp_function with stop_machine_run in the Intel RNG driver. CPU A has done read_lock(&lock) CPU B has done write_lock_irq(&lock) and is waiting for A to release the lock. A third CPU calls call_smp_function and issues the IPI. CPU A takes CPU C's IPI. CPU B is waiting with interrupts disabled and does not see the IPI. CPU C is stuck waiting for CPU B to respond to the IPI. Deadlock. The solution is to use stop_machine_run instead of call_smp_function (call_smp_function should not be called in situations where the CPUs may be suspended). [haruo.tomita@toshiba.co.jp: fix a typo in mod_init()] [haruo.tomita@toshiba.co.jp: fix memory leak] Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: "Tomita, Haruo" <haruo.tomita@toshiba.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:00 -07:00
Pekka Enberg	6d4f9c5500	module: use krealloc This converts an open-coded krealloc() to use the shiny new API. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:00 -07:00
Oleg Nesterov	02fb6149f7	softlockup: s/99/MAX_RT_PRIO/ Don't use hardcoded 99 value, use MAX_RT_PRIO. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:59 -07:00
Oleg Nesterov	1065d130dd	freezer: task->exit_state should be treated as bolean Except for BUG_ON() checks, we should not use EXIT_XXXX defines outside of exit/wait paths. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:58 -07:00
Christoph Hellwig	ab1b6f03a1	simplify the stacktrace code Simplify the stacktrace code: - remove the unused task argument to save_stack_trace, it's always current - remove the all_contexts flag, it's alwasy 0 Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andi Kleen <ak@suse.de> Cc: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:58 -07:00
Paul Mackerras	02bbc0f09c	Merge branch 'linux-2.6'	2007-05-08 13:37:51 +10:00
Rafael J. Wysocki	56f99bcb52	swsusp: free more memory Move the definition of PAGES_FOR_IO to kernel/power/power.h and introduce SPARE_PAGES representing the number of pages that should be freed by the swsusp's memory shrinker in addition to PAGES_FOR_IO so that device drivers can allocate some memory (up to 1 MB total) in their .suspend() routines without causing the suspend to fail. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:59 -07:00
Rafael J. Wysocki	9b95e43763	swsusp: fix snapshot_release Remove the leftover enable_nonboot_cpus() from snapshot_release(). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:59 -07:00
David Brownell	a7ee2e5f5b	kconfig: mention 'hibernation' not just swsusp Clarify that "software suspend" is what's called "hibernation" in most user interfaces, shrinking a terminology gap. (Examples include Gnome and MS-Windows.) Also provide a more succinct description of what it does, so you won't have to read the whole novel in Kconfig; and highlights just why the lack of BIOS requirements for swsusp are a big deal. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:59 -07:00
Johannes Berg	f0ced9b229	power management: change /sys/power/disk display Change /sys/power/disk to display all valid modes as well as the currently selected one in a fashion known from the LED subsystem. This changes userspace API, but it is apparently not used much (we asked some userspace developers) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:59 -07:00
Johannes Berg	ab3bfca7ab	remove software_suspend() Remove software_suspend() and all its users since pm_suspend(PM_SUSPEND_DISK) should be equivalent and there's no point in having two interfaces for the same thing. The patch also changes the valid_state function to return 0 (false) for PM_SUSPEND_DISK when SOFTWARE_SUSPEND is not configured instead of accepting it and having the whole thing fail later. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:59 -07:00
Rafael J. Wysocki	d1d241cc2c	swsusp: use rbtree for tracking allocated swap Make swsusp use extents instead of a bitmap to trace swap pages allocated for saving the image (the tracking is only needed in case there's an error, so that the allocated swap pages can be released). This should allow us to reduce the memory usage, practically always, and improve performance. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:59 -07:00
Rafael J. Wysocki	0709db6072	swsusp: use GFP_KERNEL for creating basic data structures Make swsusp call create_basic_memory_bitmaps() before processes are frozen, so that GFP_KERNEL allocations can be made in it. Additionally, ensure that the swsusp's userland interface won't be used while either pm_suspend_disk() or software_resume() is being executed. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:59 -07:00
Rafael J. Wysocki	1525a2ad76	swsusp: fix error paths in snapshot_open We forget to increase device_available if there's an error in snapshot_open(), so the snapshot device cannot be open at all after snapshot_open() has returned an error. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:59 -07:00
Rafael J. Wysocki	74dfd666de	swsusp: do not use page flags Make swsusp use memory bitmaps instead of page flags for marking 'nosave' and free pages. This allows us to 'recycle' two page flags that can be used for other purposes. Also, the memory needed to store the bitmaps is allocated when necessary (ie. before the suspend) and freed after the resume which is more reasonable. The patch is designed to minimize the amount of changes and there are some nice simplifications and optimizations possible on top of it. I am going to implement them separately in the future. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:59 -07:00
Rafael J. Wysocki	7be9823491	swsusp: use inline functions for changing page flags Replace direct invocations of SetPageNosave(), SetPageNosaveFree() etc. with calls to inline functions that can be changed in subsequent patches without modifying the code calling them. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:58 -07:00
Oleg Nesterov	433ecb4ab3	fix refrigerator() vs thaw_process() race refrigerator() can miss a wakeup, "wait event" loop needs a proper memory ordering. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:58 -07:00
Roland McGrath	7324328446	Return EPERM not ECHILD on security_task_wait failure wait* syscalls return -ECHILD even when an individual PID of a live child was requested explicitly, when security_task_wait denies the operation. This means that something like a broken SELinux policy can produce an unexpected failure that looks just like a bug with wait or ptrace or something. This patch makes do_wait return -EACCES (or other appropriate error returned from security_task_wait() instead of -ECHILD if some children were ruled out solely because security_task_wait failed. [jmorris@namei.org: switch error code to EACCES] Signed-off-by: Roland McGrath <roland@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Christoph Lameter	50953fe9e0	slab allocators: Remove SLAB_DEBUG_INITIAL flag I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB. I think its purpose was to have a callback after an object has been freed to verify that the state is the constructor state again? The callback is performed before each freeing of an object. I would think that it is much easier to check the object state manually before the free. That also places the check near the code object manipulation of the object. Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was compiled with SLAB debugging on. If there would be code in a constructor handling SLAB_DEBUG_INITIAL then it would have to be conditional on SLAB_DEBUG otherwise it would just be dead code. But there is no such code in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real use of, difficult to understand and there are easier ways to accomplish the same effect (i.e. add debug code before kfree). There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be clear in fs inode caches. Remove the pointless checks (they would even be pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors. This is the last slab flag that SLUB did not support. Remove the check for unimplemented flags from SLUB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Christoph Lameter	0a31bd5f2b	KMEM_CACHE(): simplify slab cache creation This patch provides a new macro KMEM_CACHE(<struct>, <flags>) to simplify slab creation. KMEM_CACHE creates a slab with the name of the struct, with the size of the struct and with the alignment of the struct. Additional slab flags may be specified if necessary. Example struct test_slab { int a,b,c; struct list_head; } __cacheline_aligned_in_smp; test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC) will create a new slab named "test_slab" of the size sizeof(struct test_slab) and aligned to the alignment of test slab. If it fails then we panic. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
David Rientjes	c596d9f320	cpusets: allow TIF_MEMDIE threads to allocate anywhere OOM killed tasks have access to memory reserves as specified by the TIF_MEMDIE flag in the hopes that it will quickly exit. If such a task has memory allocations constrained by cpusets, we may encounter a deadlock if a blocking task cannot exit because it cannot allocate the necessary memory. We allow tasks that have the TIF_MEMDIE flag to allocate memory anywhere, including outside its cpuset restriction, so that it can quickly die regardless of whether it is __GFP_HARDWALL. Cc: Andi Kleen <ak@suse.de> Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:53 -07:00
Christoph Lameter	476f35348e	Safer nr_node_ids and nr_node_ids determination and initial values The nr_cpu_ids value is currently only calculated in smp_init. However, it may be needed before (SLUB needs it on kmem_cache_init!) and other kernel components may also want to allocate dynamically sized per cpu array before smp_init. So move the determination of possible cpus into sched_init() where we already loop over all possible cpus early in boot. Also initialize both nr_node_ids and nr_cpu_ids with the highest value they could take. If we have accidental users before these values are determined then the current valud of 0 may cause too small per cpu and per node arrays to be allocated. If it is set to the maximum possible then we only waste some memory for early boot users. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
Johannes Berg	543b9fd352	[POWERPC] powermac: Suspend to disk on G5 Powermac G5 suspend to disk implementation. The code is platform agnostic but only tested on powermac, no other 64-bit powerpc machines. Because nvidiafb still breaks suspend I have marked it EXPERIMENTAL on powermac and because I can't test it and some lowlevel code will need changes it is BROKEN on all other 64-bit platforms. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-07 20:31:14 +10:00
Linus Torvalds	ea62ccd00f	Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits) [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall [PATCH] i386: type may be unused [PATCH] i386: Some additional chipset register values validation. [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split. [PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu [PATCH] i386: white space fixes in i387.h [PATCH] i386: Drop noisy e820 debugging printks [PATCH] x86-64: Fix allnoconfig error in genapic_flat.c [PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems [PATCH] x86-64: Share identical video.S between i386 and x86-64 [PATCH] x86-64: Remove CONFIG_REORDER [PATCH] x86-64: Print type and size correctly for unknown compat ioctls [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0) [PATCH] i386: Little cleanups in smpboot.c [PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible [PATCH] i386: Add X86_FEATURE_RDTSCP [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 [PATCH] i386: Implement alternative_io for i386 ... Fix up trivial conflict in include/linux/highmem.h manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-05 14:55:20 -07:00
Linus Torvalds	5b33991576	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: remove "struct subsystem" as it is no longer needed sysfs: printk format warning DOC: Fix wrong identifier name in Documentation/driver-model/devres.txt platform: reorder platform_device_del Driver core: fix show_uevent from taking up way too much stack	2007-05-04 18:04:48 -07:00
Michael Ellerman	7fe3730de7	MSI: arch must connect the irq and the msi_desc set_irq_msi() currently connects an irq_desc to an msi_desc. The archs call it at some point in their setup routine, and then the generic code sets up the reverse mapping from the msi_desc back to the irq. set_irq_msi() should do both connections, making it the one and only call required to connect an irq with it's MSI desc and vice versa. The arch code MUST call set_irq_msi(), and it must do so only once it's sure it's not going to fail the irq allocation. Given that there's no need for the arch to return the irq anymore, the return value from the arch setup routine just becomes 0 for success and anything else for failure. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:38 -07:00
Greg Kroah-Hartman	823bccfc40	remove "struct subsystem" as it is no longer needed We need to work on cleaning up the relationship between kobjects, ksets and ktypes. The removal of 'struct subsystem' is the first step of this, especially as it is not really needed at all. Thanks to Kay for fixing the bugs in this patch. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 18:57:59 -07:00
Jeremy Fitzhardinge	d6dd61c831	[PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction Add hooks to allow a paravirt implementation to track the lifetime of an mm. Paravirtualization requires three hooks, but only two are needed in common code. They are: arch_dup_mmap, which is called when a new mmap is created at fork arch_exit_mmap, which is called when the last process reference to an mm is dropped, which typically happens on exit and exec. The third hook is activate_mm, which is called from the arch-specific activate_mm() macro/function, and so doesn't need stub versions for other architectures. It's called when an mm is first used. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: linux-arch@vger.kernel.org Cc: James Bottomley <James.Bottomley@SteelEye.com> Acked-by: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	b6e3590f81	[PATCH] x86: Allow percpu variables to be page-aligned Let's allow page-alignment in general for per-cpu data (wanted by Xen, and Ingo suggested KVM as well). Because larger alignments can use more room, we increase the max per-cpu memory to 64k rather than 32k: it's getting a little tight. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge	b00742d399	[PATCH] x86-64: Account for module percpu space separately from kernel percpu Rather than using a single constant PERCPU_ENOUGH_ROOM, compute it as the sum of kernel_percpu + PERCPU_MODULE_RESERVE. This is now common to all architectures; if an architecture wants to set PERCPU_ENOUGH_ROOM to something special, then it may do so (ia64 is the only one which does). Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de>	2007-05-02 19:27:11 +02:00
Vivek Goyal	1b29c1643c	[PATCH] x86-64: do not use virt_to_page on kernel data address o virt_to_page() call should be used on kernel linear addresses and not on kernel text and data addresses. Swsusp code uses it on kernel data (statically allocated swsusp_header). o Allocate swsusp_header dynamically so that virt_to_page() can be used safely. o I am changing this because in next few patches, __pa() on x86_64 will no longer support kernel text and data addresses and hibernation breaks. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Vivek Goyal	49c3df6aaa	[PATCH] x86: Move swsusp __pa() dependent code to arch portion o __pa() should be used only on kernel linearly mapped virtual addresses and not on kernel text and data addresses. o Hibernation code needs to determine the physical address associated with kernel symbol to mark a section boundary which contains pages which don't have to be saved and restored during hibernate/resume operation. o Move this piece of code in arch dependent section. So that architectures which don't have kernel text/data mapped into kernel linearly mapped region can come up with their own ways of determining physical addresses associated with a kernel text. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Johannes Berg	9684e51cd1	power management: force pm_ops.valid callback to be assigned This patch changes the docs and behaviour from "all states valid" to "no states valid" if no .valid callback is assigned. Users of pm_ops that only need mem sleep can assign pm_valid_only_mem without any overhead, others will require more elaborate callbacks. Now that all users of pm_ops have a .valid callback this is a safe thing to do and prevents things from getting messy again as they were before. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Pavel Machek <pavel@ucw.cz> Looks-okay-to: Rafael J. Wysocki <rjw@sisk.pl> Cc: <linux-pm@lists.linux-foundation.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:40 -07:00
Johannes Berg	e8c9c50269	power management: implement pm_ops.valid for everybody Almost all users of pm_ops only support mem sleep, don't check in .valid and don't reject any others in .prepare so users can be confused if they check /sys/power/state, especially when new states are added (these would then result in s-t-r although they're supposed to be something different). This patch implements a generic pm_valid_only_mem function that is then exported for users and puts it to use in almost all existing pm_ops. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: David Brownell <david-b@pacbell.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: linux-pm@lists.linux-foundation.org Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:40 -07:00
Johannes Berg	11d77d0c01	power management: remove firmware disk mode This patch removes the firmware disk suspend mode which is the wrong approach, it is supposed to be used for implementing firmware-based disk suspend but cannot actually be used for that. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <linux-pm@lists.linux-foundation.org> Cc: David Brownell <david-b@pacbell.net> Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:40 -07:00
Johannes Berg	fe0c935a6c	rework pm_ops pm_disk_mode, kill misuse This patch series cleans up some misconceptions about pm_ops. Some users of the pm_ops structure attempt to use it to stop the user from entering suspend to disk, this, however, is not possible since the user can always use "shutdown" in /sys/power/disk and then the pm_ops are never invoked. Also, platforms that don't support suspend to disk simply should not allow configuring SOFTWARE_SUSPEND (read the help text on it, it only selects suspend to disk and nothing else, all the other stuff depends on PM). The pm_ops structure is actually intended to provide a way to enter platform-defined sleep states (currently supported states are "standby" and "mem" (suspend to ram)) and additionally (if SOFTWARE_SUSPEND is configured) allows a platform to support a platform specific way to enter low-power mode once everything has been saved to disk. This is currently only used by ACPI (S4). This patch: The pm_ops.pm_disk_mode is used in totally bogus ways since nobody really seems to understand what it actually does. This patch clarifies the pm_disk_mode description. It also removes all the arm and sh users that think they can veto suspend to disk via pm_ops; not so since the user can always do echo shutdown > /sys/power/disk, they need to find a better way involving Kconfig or such. ACPI is the only user left with a non-zero pm_disk_mode. The patch also sets the default mode to shutdown again, but when a new pm_ops is registered its pm_disk_mode is selected as default, that way the default stays for ACPI where it is apparently required. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: David Brownell <david-b@pacbell.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <linux-pm@lists.linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:40 -07:00
Robert Peterson	42e380832a	Extend print_symbol capability Today's print_symbol function dumps a kernel symbol with printk. This patch extends the functionality of kallsyms.c so that the symbol lookup function may be used without the printk. This is useful for modules that want to dump symbols elsewhere, for example, to debugfs. I intend to use the new function call in the GFS2 file system (which will be a separate patch). [akpm@linux-foundation.org: build fix] [clameter@sgi.com: sprint_symbol should return length of string like sprintf] Signed-off-by: Robert Peterson <rpeterso@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:39 -07:00
Jeff Garzik	8cdfb29c0c	libata/IDE: remove combined mode quirk Both old-IDE and libata should be able handle all controllers and devices found using normal resource reservation methods. This eliminates the awful, low-performing split-driver configuration where old-IDE drove the PATA portion of a PCI device, in PIO-only mode, and libata drove the SATA portion of the /same/ PCI device, in DMA mode. Typically vendors would ship SATA hard drive / PATA optical configuration, which would lend itself to slow (PIO-only) CD-ROM performance. For Intel users running in combined mode, it is now wholly dependent on your driver choice (potentially link order, if you compile both drivers in) whether old-IDE or libata will drive your hardware. In either case, you will get full performance from both SATA and PATA ports now, without having to pass a kernel command line parameter. Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-04-28 14:15:59 -04:00
David Howells	b8b8fd2dc2	[NET]: Fix networking compilation errors Fix miscellaneous networking compilation errors. () Export ktime_add_ns() for modules. () wext_proc_init() should have an ANSI declaration. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-27 15:31:24 -07:00
Linus Torvalds	d868772fff	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (46 commits) dev_dbg: check dev_dbg() arguments drivers/base/attribute_container.c: use mutex instead of binary semaphore mod_sysfs_setup() doesn't return errno when kobject_add_dir() failure occurs s2ram: add arch irq disable/enable hooks define platform wakeup hook, use in pci_enable_wake() security: prevent permission checking of file removal via sysfs_remove_group() device_schedule_callback() needs a module reference s390: cio: Delay uevents for subchannels sysfs: bin.c printk fix Driver core: use mutex instead of semaphore in DMA pool handler driver core: bus_add_driver should return an error if no bus debugfs: Add debugfs_create_u64() the overdue removal of the mount/umount uevents kobject: Comment and warning fixes to kobject.c Driver core: warn when userspace writes to the uevent file in a non-supported way Driver core: make uevent-environment available in uevent-file kobject core: remove rwsem from struct subsystem qeth: Remove usage of subsys.rwsem PHY: remove rwsem use from phy core IEEE1394: remove rwsem use from ieee1394 core ...	2007-04-27 12:58:54 -07:00
Akinobu Mita	240936e18b	mod_sysfs_setup() doesn't return errno when kobject_add_dir() failure occurs mod_sysfs_setup() doesn't return an errno when kobject_add_dir() for module "holders" directory fails. So caller of mod_sysfs_setup() will keep going and get oops. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-04-27 10:57:34 -07:00
Johannes Berg	a53c46dc82	s2ram: add arch irq disable/enable hooks After some more discussion this patch replaces it: From: Johannes Berg <johannes@sipsolutions.net> Subject: suspend: add arch irq disable/enable hooks For powermac, we need to do some things between suspending devices and device_power_off, for example setting the decrementer. This patch allows architectures to define arch_s2ram_{en,dis}able_irqs in their asm/suspend.h to have control over this step. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-04-27 10:57:33 -07:00
Ingo Molnar	39bc89fd40	make SysRq-T show all tasks again show_state() (SysRq-T) developed the buggy habbit of not showing TASK_RUNNING tasks. This was due to the mistaken belief that state_filter == -1 would be a pass-through filter - while in reality it did not let TASK_RUNNING == 0 p->state values through. Fix this by restoring the original '!state_filter means all tasks' special-case i had in the original version. Test-built and test-booted on i686, SysRq-T now works as intended. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-27 10:46:51 -07:00
David Howells	e19dff1fdd	[AF_RXRPC]: Make it possible to merely try to cancel timers from a module Export try_to_del_timer_sync() for use by the AF_RXRPC module. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:46:56 -07:00
Patrick McHardy	af65bdfce9	[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it Switch cb_lock to mutex and allow netlink kernel users to override it with a subsystem specific mutex for consistent locking in dump callbacks. All netlink_dump_start users have been audited not to rely on any side-effects of the previously used spinlock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:03 -07:00
Stephen Hemminger	85795d64ed	[TCP] tcp_probe: improvements for net-2.6.22 Change tcp_probe to use ktime (needed to add one export). Add option to only get events when cwnd changes - from Doug Leith Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:10 -07:00
Arnaldo Carvalho de Melo	b529ccf279	[NETLINK]: Introduce nlmsg_hdr() helper For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the number of direct accesses to skb->data and for consistency with all the other cast skb member helpers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:34 -07:00
Arnaldo Carvalho de Melo	27a884dc3c	[SK_BUFF]: Convert skb->tail to sk_buff_data_t So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:28 -07:00
Patrick McHardy	641b9e0e8b	[NET_SCHED]: Use ktime as clocksource Get rid of the manual clock source selection mess and use ktime. Also use a scalar representation, which allows to clean up pkt_sched.h a bit more and results in less ktime_to_ns() calls in most cases. The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite inefficient by this patch, following patches will convert all qdiscs to hrtimers and get rid of them entirely. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:04 -07:00
Eric Dumazet	b7aa0bf70c	[NET]: convert network timestamps to ktime_t We currently use a special structure (struct skb_timeval) and plain 'struct timeval' to store packet timestamps in sk_buffs and struct sock. This has some drawbacks : - Fixed resolution of micro second. - Waste of space on 64bit platforms where sizeof(struct timeval)=16 I suggest using ktime_t that is a nice abstraction of high resolution time services, currently capable of nanosecond resolution. As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte shrink of this structure on 64bit architectures. Some other structures also benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...) Once this ktime infrastructure adopted, we can more easily provide nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or SO_TIMESTAMPNS/SCM_TIMESTAMPNS) Note : this patch includes a bug correction in compat_sock_get_timestamp() where a "err = 0;" was missing (so this syscall returned -ENOENT instead of 0) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> CC: John find <linux.kernel@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:34 -07:00
Bastian Blank	91fcd412e9	Allow reading tainted flag as user The commit `34f5a39899` restricted reading of the tainted value. The attached patch changes this back to a write-only check and restores the read behaviour of older versions. Signed-off-by: Bastian Blank <bastian@waldi.eu.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-24 08:23:08 -07:00
Randy Dunlap	fe20e581a7	[PATCH] fix kernel oops with badly formatted module option Catch malformed kernel parameter usage of "param = value". Spaces are not supported, but don't cause a kernel fault on such usage, just report an error. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-12 15:31:41 -07:00
Linus Torvalds	d354d2f4a6	sched.c: Remove unused variable 'relative' Getting rid of the p->children printout in show_task() left behind an unused variable. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-07 10:18:33 -07:00
Ingo Molnar	35f6f753b7	[PATCH] sched: get rid of p->children use in show_task() the p->parent PID printout gives us all the information about the task tree that we need - the eldest_child()/older_sibling()/ younger_sibling() printouts are mostly historic and i do not remember ever having used those fields. (IMO in fact they confuse the SysRq-T output.) So remove them. This code has sentimental value though, those fields and printouts are one of the oldest ones still surviving from Linux v0.95's kernel/sched.c: if (p->p_ysptr \|\| p->p_osptr) printk(" Younger sib=%d, older sib=%d\n\r", p->p_ysptr ? p->p_ysptr->pid : -1, p->p_osptr ? p->p_osptr->pid : -1); else printk("\n\r"); written 15 years ago, in early 1992. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus 'snif' Torvalds <torvalds@linux-foundation.org>	2007-04-07 10:06:51 -07:00
Tejun Heo	7f30e49ee1	[PATCH] irq-devres: fix failure path of devm_request_irq() devres should be deallocated with devres_free() not kfree(). This bug corrupts slab on IRQ request failure. Fix it. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-07 10:05:21 -07:00
Ingo Molnar	995f054f2a	[PATCH] high-res timers: resume fix Soeren Sonnenburg reported that upon resume he is getting this backtrace: [<c0119637>] smp_apic_timer_interrupt+0x57/0x90 [<c0142d30>] retrigger_next_event+0x0/0xb0 [<c0104d30>] apic_timer_interrupt+0x28/0x30 [<c0142d30>] retrigger_next_event+0x0/0xb0 [<c0140068>] __kfifo_put+0x8/0x90 [<c0130fe5>] on_each_cpu+0x35/0x60 [<c0143538>] clock_was_set+0x18/0x20 [<c0135cdc>] timekeeping_resume+0x7c/0xa0 [<c02aabe1>] __sysdev_resume+0x11/0x80 [<c02ab0c7>] sysdev_resume+0x47/0x80 [<c02b0b05>] device_power_up+0x5/0x10 it turns out that on resume we mistakenly re-enable interrupts too early. Do the timer retrigger only on the current CPU. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Soeren Sonnenburg <kernel@nn7.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-07 10:03:43 -07:00
john stultz	98de9e3ba2	[PATCH] fix jiffies clocksource inittime In debugging a problem w/ the -rt tree, I noticed that on systems that mark the tsc as unstable before it is registered, the TSC would still be selected and used for a short period of time. Digging in it looks to be a result of the mix of the clocksource list changes and my clocksource initialization changes. With the -rt tree, using a bad TSC, even for a short period of time can results in a hang at boot. I was not able to reproduce this hang w/ mainline, but I'm not completely certain that someone won't trip on it. This patch resolves the issue by initializing the jiffies clocksource earlier so a bad TSC won't get selected just because nothing else is yet registered. Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-04 21:12:47 -07:00
Rafael J. Wysocki	c75fd0ee6e	[PATCH] swsusp: fix memory shrinker Fix a bug in the swsusp's memory shrinker that causes some systems using highmem to refuse to suspend to disk if image_size is set above 1/2 of available RAM. Special thanks to Jiri Slaby for reporting the problem and assistance in debugging it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-04 21:12:47 -07:00
Thomas Bittermann	456a09dce9	[PATCH] kernel/time.c: add missing symbol exports This patch adds 2 missing symbol exports: jiffies_to_timeval() and timeval_to_jiffies(). The (not yet merged) dm-raid4-5 module will need them, and they used to be indirectly exported by virtue of being inline functions. Commit `8b9365d753` ("[PATCH] Uninline jiffies.h functions") uninlined them, and thus modules now need them explicitly exported to use them. Signed-off-by: Thomas Bittermann <t.bittermann@online.de> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: john stultz <johnstul@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-04 17:35:53 -07:00
Rafael J. Wysocki	1d64b9cb1d	[PATCH] Fix microcode-related suspend problem Fix the regression resulting from the recent change of suspend code ordering that causes systems based on Intel x86 CPUs using the microcode driver to hang during the resume. The problem occurs since the microcode driver uses request_firmware() in its CPU hotplug notifier, which is called after tasks has been frozen and hangs. It can be fixed by telling the microcode driver to use the microcode stored in memory during the resume instead of trying to load it from disk. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Adrian Bunk <bunk@stusta.de> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Pavel Machek <pavel@ucw.cz> Cc: Maxim <maximlevitsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-02 10:06:09 -07:00
Kay Sievers	0c84ce268b	[PATCH] driver core: fix built-in drivers sysfs links built-in drivers had broken sysfs links that caused bootup hangs for certain driver unregistry sequences. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-02 10:06:09 -07:00
Eric W. Biederman	14e9d5730a	[PATCH] pid: Properly detect orphaned process groups in exit_notify In commit `0475ac0845` when converting the orphaned process group handling to use struct pid I made a small mistake. I accidentally replaced an == with a !=. Besides just being a dumb thing to do apparently this has a bad side effect. The improper orphaned process group detection causes kwin to die after a suspend/resume cycle. I'm amazed this patch has been around as long as it has without anyone else noticing something funny going on. And the following people deserve credit for spotting and helping to reproduce this. Thanks to: Sid Boyce <g3vbv@blueyonder.co.uk> Thanks to: "Michael Wu" Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-29 08:16:23 -07:00
Ingo Molnar	935c631db8	[PATCH] hrtimers: fix reprogramming SMP race hrtimer_start() incorrectly set the 'reprogram' flag to enqueue_hrtimer(), which should only be 1 if the hrtimer is queued to the current CPU. Doing otherwise could result in a reprogramming of the current CPU's clockevents device, with a timer that is not queued to it - resulting in a bogus next expiry value. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-28 13:44:31 -07:00
Rafael J. Wysocki	436ce71638	[PATCH] Revert "swsusp: disable nonboot CPUs before entering platform suspend" This reverts commit `94985134b7` and insteads removes the WARN_ON() that caused that commit in the first place. The problem is that we call disable_nonboot_cpus() in swsusp before powering down the system in order to avoid triggering the WARN_ON() in arch/x86_64/kernel/acpi/sleep.c:init_low_mapping() and this doesn't work well on Thomas' system. So instead, remove the WARN_ON() in arch/x86_64/kernel/acpi/sleep.c: init_low_mapping(), which triggers every time during the suspend to disk in the platform mode, as the potential problem it is related to doesn't seem to occur in practice. [ I think we might want to disallow the case of multiple users of that mm, or something. Normally, playing with the current process page tables on the current CPU should be fine as long as we don't have other threads using those tables at the same time.. Anyway, not pretty, but better than the warning or the lockup - Linus ] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-27 09:20:03 -07:00
john stultz	d62ac21aa0	[PATCH] ntp: avoid time_offset overflows I've been seeing some odd NTP behavior recently on a few boxes and finally narrowed it down to time_offset overflowing when converted to SHIFT_UPDATE units (which was a side effect from my HZfreeNTP patch). This patch converts time_offset from a long to a s64 which resolves the issue. [tglx@linutronix.de: signedness fixes] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-27 09:05:15 -07:00
Thomas Gleixner	291bc047e1	[PATCH] clockevents: remove bad designed sysfs support for now The current sysfs support of clockevents does not obey the "only one value per file" rule. The real fix is not 2.6.21 material. Therefor remove the sysfs support for now. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-26 14:07:23 -07:00
Thomas Gleixner	948ac6d71c	[PATCH] clocksource: Fix thinko in watchdog selection The watchdog implementation excludes low res / non continuous clocksources from being selected as a watchdog reference unintentionally. Allow using jiffies/PIT as a watchdog reference as long as no better clocksource is available. This is necessary to detect TSC breakage on systems, which have no pmtimer/hpet. The main goal of the initial patch (preventing to switch to highres/nohz when no reliable fallback clocksource is available) is still guaranteed by the checks in clocksource_watchdog(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-25 14:57:34 -07:00
Thomas Gleixner	9501b6cf55	[PATCH] dynticks: fix hrtimer rounding error in next_timer_interrupt The rework of next_timer_interrupt() fixed the timer wheel bugs, but invented a rounding error versus the next hrtimer event. This is caused by the conversion of the hrtimer internal representation to relative jiffies. This causes bug #8100: http://bugzilla.kernel.org/show_bug.cgi?id=8100 next_timer_interrupt() returns "now" in such a case and causes the code in tick_nohz_stop_sched_tick() to trigger the timer softirq, which is bogus as no timer is due for expiry. This results in an endless context switching between idle and ksoftirqd until a timer is due for expiry. Modify the hrtimer evaluation so that, it returns now + 1, when the conversion results in a delta < 1 jiffie. It's confirmed to resolve bug #8100 Reported-by: Emil Karlson <jkarlson@cc.hut.fi> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-25 14:57:34 -07:00
James Morris	0444b3035e	[PATCH] time: fix formatting in /proc/timer_list Fix the print formatting of three unsigned long fields in /proc/timer_list, which are currently being formatted as signed long. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-23 11:01:21 -07:00
Jarek Poplawski	9c35dd7f8b	[PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks lockdep's data shouldn't be used when debug_locks == 0 because it's not updated after this, so it's more misleading than helpful. PS: probably lockdep's current-> fields should be reset after it turns debug_locks off: so, after printing a bug report, but before return from exported functions, but there are really a lot of these possibilities (e.g. after DEBUG_LOCKS_WARN_ON), so, something could be missed. (Of course direct use of this fields isn't recommended either.) Reported-by: Folkert van Heusden <folkert@vanheusden.com> Inspired-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-22 19:39:06 -07:00
Pavel Machek	058560fbd7	[PATCH] fix extra BIOS invocation during resume It causes extra moon icons blinking on x60, and breaks at least two other systems. During resume, we do not know that "reboot"/"shutdown" method was used, so we assume "plaform" and call BIOS, anyway... This is 2.6.21 material, and should fix 2 or 3 regressions from 2.6.20. Signed-off-by: Pavel Machek <pavel@suse.cz> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-22 19:39:06 -07:00
Rafael J. Wysocki	93c9a7ff50	[PATCH] swsusp: Fix SNAPSHOT_S2RAM ioctl The SNAPSHOT_S2RAM ioctl does not disable the nonboot CPUs before entering the suspend, although it should do this. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-22 19:39:05 -07:00
Thomas Gleixner	cd05a1f818	[PATCH] clockevents: Fix suspend/resume to disk hangs I finally found a dual core box, which survives suspend/resume without crashing in the middle of nowhere. Sigh, I never figured out from the code and the bug reports what's going on. The observed hangs are caused by a stale state transition of the clock event devices, which keeps the RCU synchronization away from completion, when the non boot CPU is brought back up. The suspend/resume in oneshot mode needs the similar care as the periodic mode during suspend to RAM. My assumption that the state transitions during the different shutdown/bringups of s2disk would go through the periodic boot phase and then switch over to highres resp. nohz mode were simply wrong. Add the appropriate suspend / resume handling for the non periodic modes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-16 19:35:25 -07:00
Zilvinas Valinskas	e29e175b0f	[PATCH] initialise pi_lock if CONFIG_RT_MUTEXES=N Fixes a bogus lockdep warning which causes lockdep to disable itself. Acked-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-16 19:25:06 -07:00
Ingo Molnar	21778867b1	[PATCH] futex: PI state locking fix Testing of -rt by IBM uncovered a locking bug in wake_futex_pi(): the PI state needs to be locked before we access it. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-16 19:25:06 -07:00
Andrew Johnson	b257bc051f	[PATCH] swsusp: fix suspend when console is in VT_AUTO+KD_GRAPHICS mode When the console is in VT_AUTO+KD_GRAPHICS mode, switching to the SUSPEND_CONSOLE fails, resulting in vt_waitactive() waiting indefinitely or until the task is interrupted. This patch tests if a console switch can occur in set_console() and returns early if a console switch is not possible. [akpm@linux-foundation.org: cleanup] Signed-off-by: Andrew Johnson <ajohnson@intrinsyc.com> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-16 19:25:05 -07:00
Thomas Gleixner	ad28d94abb	[PATCH] hrtimer: fix up unlocked access to wall_to_monotonic commit `f4304ab215` (HZ free NTP) moved the access to wall_to_monotonic in hrtimer_get_softirq_time() out of the xtime_lock protection. Move it back into the seq_lock section. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-16 19:25:05 -07:00
Thomas Gleixner	13788ccc41	[PATCH] hrtimer: prevent overrun DoS in hrtimer_forward() hrtimer_forward() does not check for the possible overflow of timer->expires. This can happen on 64 bit machines with large interval values and results currently in an endless loop in the softirq because the expiry value becomes negative and therefor the timer is expired all the time. Check for this condition and set the expiry value to the max. expiry time in the future. The fix should be applied to stable kernel series as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-16 19:25:05 -07:00
Rafael J. Wysocki	94985134b7	[PATCH] swsusp: disable nonboot CPUs before entering platform suspend Prevent the WARN_ON() in arch/x86_64/kernel/acpi/sleep.c:init_low_mapping() from triggering by disabling nonboot CPUs before we finally enter the platform suspend. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-16 19:25:03 -07:00
Rafael J. Wysocki	886c595295	[PATCH] swsusp: Fix resume error path in platform mode If swsusp is using the platform mode during the resume and the image cannot be read, the platform mode should be switched off before software_resume() returns. Make it happen. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-16 19:25:03 -07:00
Al Viro	c4823bce03	[PATCH] fix deadlock in audit_log_task_context() GFP_KERNEL allocations in non-blocking context; fixed by killing an idiotic use of security_getprocattr(). Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-14 15:27:48 -07:00
Greg Kroah-Hartman	161e232b88	Revert "driver core: refcounting fix" This reverts commit `63ce18cfe6`. It was the incorrect fix and causes a reference counting bug whenever any driver module is removed from the system. Mike Galbraith <efault@gmx.de> is looking for the real fix for his problem. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-03-09 15:25:04 -08:00
Linus Torvalds	d694c16bc3	Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: sh: Kill off I/O cruft for R7780RP. sh: Revert lazy dcache writeback changes. sh: Enable SM501 support for RTS7751R2D. sh: Use L1_CACHE_BYTES for .data.cacheline_aligned. sysctl: Support vdso_enabled sysctl on SH. sh: Fix kernel thread stack corruption with preempt. doc: Add SH to vdso and earlyprintk in kernel-parameters.txt sh: Fix sigmask trampling in signal delivery. sh: Clear UBC when not in use.	2007-03-07 10:08:33 -08:00
Rafael J. Wysocki	c7276fde27	[PATCH] kconfig: Update swsusp description Update the outdated and inaccurate description of the software suspend in Kconfig. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-06 09:30:26 -08:00
Josh Triplett	d6ad67112a	[PATCH] Publish rcutorture module parameters via sysfs, read-only rcutorture's module parameters currently use permissions of 0, so they don't show up in /sys/module/rcutorture/parameters. Change the permissions on all module parameters to world-readable (0444). rcutorture does all of its initialization and thread startup when loaded and relies on the parameters not changing during execution, so they should not permit writing. However, reading seems fine. Signed-off-by: Josh Triplett <josh@freedesktop.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-06 09:30:25 -08:00
Daniel Walker	90675a27fa	[PATCH] fix vsyscall settimeofday I've only seen this on x86_64. The vsyscall state only gets updated when a timer interrupts comes in. So if the time is set long before the next timer, there will be a period when a gettimeofday() won't reflect the correct time. I added an explicit update_vsyscall() during the settimeofday(), that way the vsyscall state doesn't get stale. Signed-off-by: Daniel Walker <dwalker@mvista.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-06 09:30:25 -08:00
Thomas Gleixner	f8953856eb	[PATCH] highres: do not run the TIMER_SOFTIRQ after switching to highres mode The TIMER_SOFTIRQ runs the hrtimers during bootup until a usable clocksource and clock event sources are registered. The switch to high resolution mode happens inside of the TIMER_SOFTIRQ, but runs the softirq afterwards. That way the tick emulation timer, which was set up in the switch to highres might be executed in the softirq context, which is a BUG. The rbtree has not to be touched by the softirq after the highres switch. This BUG was observed by Andres Salomon, who provided the information to debug it. Return early from the softirq, when the switch was sucessful. [dilinger@debian.org: add debug warning] [akpm@linux-foundation.org: make debug warning compile] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andres Salomon <dilinger@debian.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andres Salomon <dilinger@debian.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-06 09:30:25 -08:00
Thomas Gleixner	6321dd60c7	[PATCH] Save/restore periodic tick information over suspend/resume The programming of periodic tick devices needs to be saved/restored across suspend/resume - otherwise we might end up with a system coming up that relies on getting a PIT (or HPET) interrupt, while those devices default to 'no interrupts' after powerup. (To confuse things it worked to a certain degree on some systems because the lapic gets initialized as a side-effect of SMP bootup.) This suspend / resume thing was dropped unintentionally during the last-minute -mm code reshuffling. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-06 09:30:24 -08:00
Heiko Carstens	e81ce1f7ec	[PATCH] timer/hrtimer: take per cpu locks in sane order Doing something like this on a two cpu system # echo 0 > /sys/devices/system/cpu/cpu0/online # echo 1 > /sys/devices/system/cpu/cpu0/online # echo 0 > /sys/devices/system/cpu/cpu1/online will give me this: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.21-rc2-g562aa1d4-dirty #7 ------------------------------------------------------- bash/1282 is trying to acquire lock: (&cpu_base->lock_key){.+..}, at: [<000000000005f17e>] hrtimer_cpu_notify+0xc6/0x240 but task is already holding lock: (&cpu_base->lock_key#2){.+..}, at: [<000000000005f174>] hrtimer_cpu_notify+0xbc/0x240 which lock already depends on the new lock. This happens because we have the following code in kernel/hrtimer.c: migrate_hrtimers(int cpu) [...] old_base = &per_cpu(hrtimer_bases, cpu); new_base = &get_cpu_var(hrtimer_bases); [...] spin_lock(&new_base->lock); spin_lock(&old_base->lock); Which means the spinlocks are taken in an order which depends on which cpu gets shut down from which other cpu. Therefore lockdep complains that there might be an ABBA deadlock. Since migrate_hrtimers() gets only called on cpu hotplug it's safe to assume that it isn't executed concurrently on a The same problem exists in kernel/timer.c: migrate_timers(). As pointed out by Christian Borntraeger one possible solution to avoid the locking order complaints would be to make sure that the locks are always taken in the same order. E.g. by taking the lock of the cpu with the lower number first. To achieve this we introduce two new spinlock functions double_spin_lock and double_spin_unlock which lock or unlock two locks in a given order. Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Christian Borntraeger <cborntra@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-05 07:57:53 -08:00
john stultz	6bb74df481	[PATCH] clocksource init adjustments (fix bug #7426 ) This patch resolves the issue found here: http://bugme.osdl.org/show_bug.cgi?id=7426 The basic summary is: Currently we register most of i386/x86_64 clocksources at module_init time. Then we enable clocksource selection at late_initcall time. This causes some problems for drivers that use gettimeofday for init calibration routines (specifically the es1968 driver in this case), where durring module_init, the only clocksource available is the low-res jiffies clocksource. This may cause slight calibration errors, due to the small sampling time used. It should be noted that drivers that require fine grained time may not function on architectures that do not have better then jiffies resolution timekeeping (there are a few). However, this does not discount the reasonable need for such fine-grained timekeeping at init time. Thus the solution here is to register clocksources earlier (ideally when the hardware is being initialized), and then we enable clocksource selection at fs_initcall (before device_initcall). This patch should probably get some testing time in -mm, since clocksource selection is one of the most important issues for correct timekeeping, and I've only been able to test this on a few of my own boxes. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-05 07:57:53 -08:00
Con Kolivas	69f7c0a1be	[PATCH] sched: remove SMT nice Remove the SMT-nice feature which idles sibling cpus on SMT cpus to facilitiate nice working properly where cpu power is shared. The idling of cpus in the presence of runnable tasks is considered too fragile, easy to break with outside code, and the complexity of managing this system if an architecture comes along with many logical cores sharing cpu power will be unworkable. Remove the associated per_cpu_gain variable in sched_domains used only by this code. Also: The reason is that with dynticks enabled, this code breaks without yet further tweaks so dynticks brought on the rapid demise of this code. So either we tweak this code or kill it off entirely. It was Ingo's preference to kill it off. Either way this needs to happen for 2.6.21 since dynticks has gone in. Signed-off-by: Con Kolivas <kernel@kolivas.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-05 07:57:51 -08:00
Paul Mundt	5c36e6578d	sysctl: Support vdso_enabled sysctl on SH. All of the logic for this was already in place, we just hadn't wired it up in the sysctl table. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-03-05 14:13:26 +09:00
Ingo Molnar	7355690ead	[PATCH] sched: fix SMT scheduler bug The SMT scheduler incorrectly skips kernel threads even if they are runnable (but they are preempted by a higher-prio user-space task which got SMT-delayed by an even higher-priority task running on a sibling CPU). Fix this for now by only doing the SMT-nice optimization if the to-be-delayed task is the only runnable task. (This should cover most of the real-life cases anyway.) This bug has been in the SMT scheduler since 2.6.17 or so, but has only been noticed now by the active check in the dynticks code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-01 14:53:38 -08:00
Sam Ravnborg	1499993cc7	[PATCH] fix section mismatch warning in lockdep lockdep_init() is marked __init but used in several places outside __init code. This causes following warnings: $ scripts/mod/modpost kernel/lockdep.o WARNING: kernel/built-in.o - Section mismatch: reference to .init.text:lockdep_init from .text.lockdep_init_map after 'lockdep_init_map' (at offset 0x105) WARNING: kernel/built-in.o - Section mismatch: reference to .init.text:lockdep_init from .text.lockdep_reset_lock after 'lockdep_reset_lock' (at offset 0x35) WARNING: kernel/built-in.o - Section mismatch: reference to .init.text:lockdep_init from .text.__lock_acquire after '__lock_acquire' (at offset 0xb2) The warnings are less obviously due to heavy inlining by gcc - this is not altered. Fix the section mismatch warnings by removing the __init marking, which seems obviously wrong. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-01 14:53:37 -08:00
Adrian Bunk	93a6fefe2f	[PATCH] fix the SYSCTL=n compilation /home/bunk/linux/kernel-2.6/linux-2.6.20-mm2/kernel/sysctl.c:1411: error: conflicting types for 'register_sysctl_table' /home/bunk/linux/kernel-2.6/linux-2.6.20-mm2/include/linux/sysctl.h:1042: error: previous declaration of 'register_sysctl_table' was here make[2]: *** [kernel/sysctl.o] Error 1 Caused by commit `0b4d414714`. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-01 14:53:37 -08:00
Thomas Gleixner	c1e16aa279	[PATCH] Fix posix-cpu-timer breakage caused by stale p->last_ran value Problem description at: http://bugzilla.kernel.org/show_bug.cgi?id=8048 Commit `b18ec80396` [PATCH] sched: improve migration accuracy optimized the scheduler time calculations, but broke posix-cpu-timers. The problem is that the p->last_ran value is not updated after a context switch. So a subsequent call to current_sched_time() calculates with a stale p->last_ran value, i.e. accounts the full time, which the task was scheduled away. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-01 14:53:37 -08:00
Randy Dunlap	05fb6bf0b2	[PATCH] kernel-doc fixes for 2.6.20-git15 (non-drivers) Fix kernel-doc warnings in 2.6.20-git15 (lib/, mm/, kernel/, include/). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-01 14:53:37 -08:00
Daniel Walker	9d63463114	[PATCH] update timekeeping_is_continuous comment Signed-off-by: Daniel Walker <dwalker@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-01 14:53:36 -08:00
Linus Torvalds	b0138a6cb7	Merge master.kernel.org:/pub/scm/linux/kernel/git/kyle/parisc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/kyle/parisc-2.6: (78 commits) [PARISC] Use symbolic last syscall in __NR_Linux_syscalls [PARISC] Add missing statfs64 and fstatfs64 syscalls Revert "[PARISC] Optimize TLB flush on SMP systems" [PARISC] Compat signal fixes for 64-bit parisc [PARISC] Reorder syscalls to match unistd.h Revert "[PATCH] make kernel/signal.c:kill_proc_info() static" [PARISC] fix sys_rt_sigqueueinfo [PARISC] fix section mismatch warnings in harmony sound driver [PARISC] do not export get_register/set_register [PARISC] add ENTRY()/ENDPROC() and simplify assembly of HP/UX emulation code [PARISC] convert to use CONFIG_64BIT instead of __LP64__ [PARISC] use CONFIG_64BIT instead of __LP64__ [PARISC] add ASM_EXCEPTIONTABLE_ENTRY() macro [PARISC] more ENTRY(), ENDPROC(), END() conversions [PARISC] fix ENTRY() and ENDPROC() for 64bit-parisc [PARISC] Fixes /proc/cpuinfo cache output on B160L [PARISC] implement standard ENTRY(), END() and ENDPROC() [PARISC] kill ENTRY_SYS_CPUS [PARISC] clean up debugging printks in smp.c [PARISC] factor syscall_restart code out of do_signal ... Fix conflict in include/linux/sched.h due to kill_proc_info() being made publicly available to PARISC again.	2007-02-26 12:48:06 -08:00
Linus Torvalds	5313a20bfc	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/tick-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/tick-2.6: [TICK] tick-common: Fix one-shot handling in tick_handle_periodic(). [TIME] tick-sched: Add missing asm/irq_regs.h include.	2007-02-26 11:42:10 -08:00
Linus Torvalds	a7538a7f87	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: Revert "Driver core: let request_module() send a /sys/modules/kmod/-uevent" Driver core: fix error by cleanup up symlinks properly make kernel/kmod.c:kmod_mk static power management: fix struct layout and docs power management: no valid states w/o pm_ops Driver core: more fallout from class_device changes for pcmcia sysfs: move struct sysfs_dirent to private header driver core: refcounting fix Driver core: remove class_device_rename	2007-02-26 11:41:30 -08:00
David S. Miller	3494c16676	[TICK] tick-common: Fix one-shot handling in tick_handle_periodic(). When clockevents_program_event() is given an expire time in the past, it does not update dev->next_event, so this looping code would loop forever once the first in-the-past expiration time was used. Keep advancing "next" locally to fix this bug. Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:14:15 -08:00
David S. Miller	9e203bcc10	[TIME] tick-sched: Add missing asm/irq_regs.h include. Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:13:49 -08:00
Eric W. Biederman	2a786b452e	[PATCH] genirq: Mask irqs when migrating them. move_native_irqs tries to do the right thing when migrating irqs by disabling them. However disabling them is a software logical thing, not a hardware thing. This has always been a little flaky and after Ingo's latest round of changes it is guaranteed to not mask the apic. So this patch fixes move_native_irq to directly call the mask and unmask chip methods to guarantee that we mask the irq when we are migrating it. We must do this as it is required by all code that call into the path. Since we don't know the masked status when IRQ_DISABLED is set so we will not be able to restore it. The patch makes the code just give up and trying again the next time this routing is called. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-26 10:34:08 -08:00
Greg Kroah-Hartman	dfff0a0671	Revert "Driver core: let request_module() send a /sys/modules/kmod/-uevent" This reverts commit `c353c3fb07`. It turns out that we end up with a loop trying to load the unix module and calling netfilter to do that. Will redo the patch later to not have this loop. Acked-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-23 14:54:57 -08:00
Adrian Bunk	4541ac94d0	make kernel/kmod.c:kmod_mk static This patch makes the needlessly global struct kmod_mk static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-23 14:52:09 -08:00
Johannes Berg	9c372d06ce	power management: no valid states w/o pm_ops Change /sys/power/state to not advertise any valid states (except for disk if SOFTWARE_SUSPEND is enabled) when no pm_ops have been set so userspace can easily discover what states should be available. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Macheck <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-23 14:52:09 -08:00
Mike Galbraith	63ce18cfe6	driver core: refcounting fix Fix a reference counting bug exposed by commit `725522b545`. If driver.mod_name exists, we take a reference in module_add_driver(), and never release it. Undo that reference in module_remove_driver(). Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-23 14:52:09 -08:00
Jarek Poplawski	60e114d113	[PATCH] lockdep: debug_locks check after check_chain_key In __lock_acquire check_chain_key can turn off debug_locks, so check is needed to assure proper return code. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-20 17:10:14 -08:00
Srinivasa Ds	346fd59bab	[PATCH] kprobes: list all active probes in the system This patch lists all active probes in the system by scanning through kprobe_table[]. It takes care of aggregate handlers and prints the type of the probe. Letter "k" for kprobes, "j" for jprobes, "r" for kretprobes. It also lists address of the instruction,its symbolic name(function name + offset) and the module name. One can access this file through /sys/kernel/debug/kprobes/list. Output looks like this ===================== llm40:~/a # cat /sys/kernel/debug/kprobes/list c0169ae3 r sys_read+0x0 c0169ae3 k sys_read+0x0 c01694c8 k vfs_write+0x0 c0167d20 r sys_open+0x0 f8e658a6 k reiserfs_delete_inode+0x0 reiserfs c0120f4a k do_fork+0x0 c0120f4a j do_fork+0x0 c0169b4a r sys_write+0x0 c0169b4a k sys_write+0x0 c0169622 r vfs_read+0x0 ================================= [akpm@linux-foundation.org: cleanup] [ananth@in.ibm.com: sparc build fix] Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-20 17:10:14 -08:00
Thomas Gleixner	bc5393a6c9	[PATCH] NOHZ: Produce debug output instead of a BUG() The BUG_ON() in tick_nohz_stop_sched_tick() triggers on some boxen. Remove the BUG_ON and print information about the pending softirq to allow better debugging of the problem. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-19 14:18:43 -08:00
Ingo Molnar	6ba9b346e1	[PATCH] NOHZ: Fix RCU handling When a CPU is needed for RCU the tick has to continue even when it was stopped before. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-19 14:18:43 -08:00
Linus Torvalds	cb4aaf46c0	Merge branch 'audit.b37' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b37' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] AUDIT_FD_PAIR [PATCH] audit config lockdown [PATCH] minor update to rule add/delete messages (ver 2)	2007-02-19 13:29:54 -08:00
Linus Torvalds	874ff01bd9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits) Documentation/kernel-docs.txt update. arch/cris: typo in KERN_INFO Storage class should be before const qualifier kernel/printk.c: comment fix update I/O sched Kconfig help texts - CFQ is now default, not AS. Remove duplicate listing of Cris arch from README kbuild: more doc. cleanups doc: make doc. for maxcpus= more visible drivers/net/eexpress.c: remove duplicate comment add a help text for BLK_DEV_GENERIC correct a dead URL in the IP_MULTICAST help text fix the BAYCOM_SER_HDX help text fix SCSI_SCAN_ASYNC help text trivial documentation patch for platform.txt Fix typos concerning hierarchy Fix comment typo "spin_lock_irqrestore". Fix misspellings of "agressive". drivers/scsi/a100u2w.c: trivial typo patch Correct trivial typo in log2.h. Remove useless FIND_FIRST_BIT() macro from cardbus.c. ...	2007-02-19 13:29:02 -08:00
Al Viro	db3495099d	[PATCH] AUDIT_FD_PAIR Provide an audit record of the descriptor pair returned by pipe() and socketpair(). Rewritten from the original posted to linux-audit by John D. Ramsdell <ramsdell@mitre.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-02-17 21:30:15 -05:00
Steve Grubb	6a01b07fae	[PATCH] audit config lockdown The following patch adds a new mode to the audit system. It uses the audit_enabled config option to introduce the idea of audit enabled, but configuration is immutable. Any attempt to change the configuration while in this mode is audited. To change the audit rules, you'd need to reboot the machine. To use this option, you'd need a modified version of auditctl and use "-e 2". This is intended to go at the end of the audit.rules file for people that want an immutable configuration. This patch also adds "res=" to a number of configuration commands that did not have it before. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-02-17 21:30:12 -05:00
Steve Grubb	a17b4ad778	[PATCH] minor update to rule add/delete messages (ver 2) I was looking at parsing some of these messages and found that I wanted what it was doing next to an op= for the parser to key on. Also missing was the list number and results. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-02-17 21:30:09 -05:00
Patrick Pletscher	0bbfb7c2e4	kernel/printk.c: comment fix Signed-off-by: Patrick Pletscher <pat@pletscher.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-02-17 20:10:16 +01:00
Matthew Wilcox	c3de4b3815	Revert "[PATCH] make kernel/signal.c:kill_proc_info() static" This reverts commit `d3228a887c`. DeBunk this code. We need it for compat_sys_rt_sigqueueinfo. Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>	2007-02-17 01:20:07 -05:00
Randy Dunlap	ef665c1a06	sysfs: fix build errors: uevent with CONFIG_SYSFS=n Fix source files to build with CONFIG_SYSFS=n. module_subsys is not available. SYSFS=n, MODULES=y: T:y SYSFS=n, MODULES=n: T:y SYSFS=y, MODULES=y: T:y SYSFS=y, MODULES=n: T:y Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-16 15:19:18 -08:00
Mariusz Kozlowski	b92be9f1ec	Driver: remove redundant kobject_unregister checks Here is a patch that removes all redundant kobject_unregister argument checks. Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-16 15:19:17 -08:00
Kay Sievers	c353c3fb07	Driver core: let request_module() send a /sys/modules/kmod/-uevent On recent systems, calls to /sbin/modprobe are handled by udev depending on the kind of device the kernel has discovered. This patch creates an uevent for the kernels internal request_module(), to let udev take control over the request, instead of forking the binary directly by the kernel. The direct execution of /sbin/modprobe can be disabled by setting: /sys/module/kmod/mod_request_helper (/proc/sys/kernel/modprobe) to an empty string, the same way /proc/sys/kernel/hotplug is disabled on an udev system. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-16 15:19:15 -08:00
Jan Beulich	5575ddf75c	[PATCH] small irq management simplification Use mask_ack_irq() where possible. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:14:00 -08:00
Randy Dunlap	472900b8b0	[PATCH] IRQ kernel-doc fixes Fix kernel-doc warnings in IRQ management. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:14:00 -08:00
Ingo Molnar	76d2160147	[PATCH] genirq: do not mask interrupts by default Never mask interrupts immediately upon request. Disabling interrupts in high-performance codepaths is rare, and on the other hand this change could recover lost edges (or even other types of lost interrupts) by conservatively only masking interrupts after they happen. (NOTE: with this change the highlevel irq-disable code still soft-disables this IRQ line - and if such an interrupt happens then the IRQ flow handler keeps the IRQ masked.) Mark i8529A controllers as 'never loses an edge'. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:14:00 -08:00
Paul E. McKenney	1f2ea0837d	[PATCH] posix timers: RCU optimization for clock_gettime() Use RCU to avoid the need to acquire tasklist_lock in the single-threaded case of clock_gettime(). It still acquires tasklist_lock when for a (potentially multithreaded) process. This change allows realtime applications to frequently monitor CPU consumption of individual tasks, as requested (and now deployed) by some off-list users. This has been in Ingo Molnar's -rt patchset since late 2005 with no problems reported, and tests successfully on 2.6.20-rc6, so I believe that it is long-since ready for mainline adoption. [paulmck@linux.vnet.ibm.com: fix exit()/posix_cpu_clock_get() race spotted by Oleg] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:14:00 -08:00
john stultz	c37e7bb5d2	[PATCH] time: x86_64: split x86_64/kernel/time.c up In preparation for the x86_64 generic time conversion, this patch splits out TSC and HPET related code from arch/x86_64/kernel/time.c into respective hpet.c and tsc.c files. [akpm@osdl.org: fix printk timestamps] [akpm@osdl.org: cleanup] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@muc.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:14:00 -08:00
john stultz	acc9a9dcdd	[PATCH] generic: vsyscall-gtod support for GENERIC_TIME Provides generic infrastructure for vsyscall-gtod. [akpm@osdl.org: cleanup] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@muc.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:14:00 -08:00
Ingo Molnar	289f480af8	[PATCH] Add debugging feature /proc/timer_list add /proc/timer_list, which prints all currently pending (high-res) timers, all clock-event sources and their parameters in a human-readable form. Sample output: Timer List Version: v0.1 HRTIMER_MAX_CLOCK_BASES: 2 now at 4246046273872 nsecs cpu: 0 clock 0: .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1273998312645738432 nsecs active timers: clock 1: .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <f5a90ec8>, hrtimer_sched_tick, hrtimer_stop_sched_tick, swapper/0 # expires at 4246432689566 nsecs [in 386415694 nsecs] #1: <f5a90ec8>, hrtimer_wakeup, do_nanosleep, pcscd/2050 # expires at 4247018194689 nsecs [in 971920817 nsecs] #2: <f5a90ec8>, hrtimer_wakeup, do_nanosleep, irqbalance/1909 # expires at 4247351358392 nsecs [in 1305084520 nsecs] #3: <f5a90ec8>, hrtimer_wakeup, do_nanosleep, crond/2157 # expires at 4249097614968 nsecs [in 3051341096 nsecs] #4: <f5a90ec8>, it_real_fn, do_setitimer, syslogd/1888 # expires at 4251329900926 nsecs [in 5283627054 nsecs] .expires_next : 4246432689566 nsecs .hres_active : 1 .check_clocks : 0 .nr_events : 31306 .idle_tick : 4246020791890 nsecs .tick_stopped : 1 .idle_jiffies : 986504 .idle_calls : 40700 .idle_sleeps : 36014 .idle_entrytime : 4246019418883 nsecs .idle_sleeptime : 4178181972709 nsecs cpu: 1 clock 0: .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1273998312645738432 nsecs active timers: clock 1: .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <f5a90ec8>, hrtimer_sched_tick, hrtimer_restart_sched_tick, swapper/0 # expires at 4246050084568 nsecs [in 3810696 nsecs] #1: <f5a90ec8>, hrtimer_wakeup, do_nanosleep, atd/2227 # expires at 4261010635003 nsecs [in 14964361131 nsecs] #2: <f5a90ec8>, hrtimer_wakeup, do_nanosleep, smartd/2332 # expires at 5469485798970 nsecs [in 1223439525098 nsecs] .expires_next : 4246050084568 nsecs .hres_active : 1 .check_clocks : 0 .nr_events : 24043 .idle_tick : 4246046084568 nsecs .tick_stopped : 0 .idle_jiffies : 986510 .idle_calls : 26360 .idle_sleeps : 22551 .idle_entrytime : 4246043874339 nsecs .idle_sleeptime : 4170763761184 nsecs tick_broadcast_mask: 00000003 event_broadcast_mask: 00000001 CPU#0's local event device: Clock Event Device: lapic capabilities: 0000000e max_delta_ns: 807385544 min_delta_ns: 1443 mult: 44624025 shift: 32 set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: hrtimer_interrupt .installed: 1 .expires: 4246432689566 nsecs CPU#1's local event device: Clock Event Device: lapic capabilities: 0000000e max_delta_ns: 807385544 min_delta_ns: 1443 mult: 44624025 shift: 32 set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: hrtimer_interrupt .installed: 1 .expires: 4246050084568 nsecs Clock Event Device: hpet capabilities: 00000007 max_delta_ns: 2147483647 min_delta_ns: 3352 mult: 61496110 shift: 32 set_next_event: hpet_next_event set_mode: hpet_set_mode event_handler: handle_nextevt_broadcast Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Ingo Molnar	82f67cd9fc	[PATCH] Add debugging feature /proc/timer_stat Add /proc/timer_stats support: debugging feature to profile timer expiration. Both the starting site, process/PID and the expiration function is captured. This allows the quick identification of timer event sources in a system. Sample output: # echo 1 > /proc/timer_stats # cat /proc/timer_stats Timer Stats Version: v0.1 Sample period: 4.010 s 24, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick) 11, 0 swapper sk_reset_timer (tcp_delack_timer) 6, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick) 2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) 17, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick) 2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) 4, 2050 pcscd do_nanosleep (hrtimer_wakeup) 5, 4179 sshd sk_reset_timer (tcp_write_timer) 4, 2248 yum-updatesd schedule_timeout (process_timeout) 18, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick) 3, 0 swapper sk_reset_timer (tcp_delack_timer) 1, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer) 2, 1 swapper e1000_up (e1000_watchdog) 1, 1 init schedule_timeout (process_timeout) 100 total events, 25.24 events/sec [ cleanups and hrtimers support from Thomas Gleixner <tglx@linutronix.de> ] [bunk@stusta.de: nr_entries can become static] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	8bfd9a7a22	[PATCH] hrtimers: prevent possible itimer DoS Fix potential setitimer DoS with high-res timers by pushing itimer rearm processing to process context. [Fixes from: Ingo Molnar <mingo@elte.hu>] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	54cdfdb47f	[PATCH] hrtimers: add high resolution timer support Implement high resolution timers on top of the hrtimers infrastructure and the clockevents / tick-management framework. This provides accurate timers for all hrtimer subsystem users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	79bf2bb335	[PATCH] tick-management: dyntick / highres functionality With Ingo Molnar <mingo@elte.hu> Add functions to provide dynamic ticks and high resolution timers. The code which keeps track of jiffies and handles the long idle periods is shared between tick based and high resolution timer based dynticks. The dyntick functionality can be disabled on the kernel commandline. Provide also the infrastructure to support high resolution timers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	f8381cba04	[PATCH] tick-management: broadcast functionality With Ingo Molnar <mingo@elte.hu> Add broadcast functionality, so per cpu clock event devices can be registered as dummy devices or switched from/to broadcast on demand. The broadcast function distributes the events via the broadcast function of the clock event device. This is primarily designed to replace the switch apic timer to / from IPI in power states, where the apic stops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	906568c9c6	[PATCH] tick-management: core functionality With Ingo Molnar <mingo@elte.hu> The tick-management code is the first user of the clockevents layer. It takes clock event devices from the clock events core and uses them to provide the periodic tick. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	d316c57ff6	[PATCH] clockevents: add core functionality Architectures register their clock event devices, in the clock events core. Users of the clockevents core can get clock event devices for their use. The clockevents core code provides notification mechanisms for various clock related management events. This allows to control the clock event devices without the architectures having to worry about the details of function assignment. This is also a preliminary for high resolution timers and dynamic ticks to allow the core code to control the clock functionality without intrusive changes to the architecture code. [Fixes-by: Ingo Molnar <mingo@elte.hu>] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	5cfb6de7cd	[PATCH] hrtimers: clean up callback tracking Reintroduce ktimers feature "optimized away" by the ktimers review process: remove the curr_timer pointer from the cpu-base and use the hrtimer state. No functional changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Thomas Gleixner	303e967ff9	[PATCH] hrtimers; add state tracking Reintroduce ktimers feature "optimized away" by the ktimers review process: multiple hrtimer states to enable the running of hrtimers without holding the cpu-base-lock. (The "optimized" rbtree hack carried only 2 states worth of information and we need 4 for high resolution timers and dynamic ticks.) No functional changes. Build-fixes-from: Andrew Morton <akpm@osdl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Thomas Gleixner	3c8aa39d7c	[PATCH] hrtimers: cleanup locking Improve kernel/hrtimers.c locking: use a per-CPU base with a lock to control locking of all clocks belonging to a CPU. This simplifies code that needs to lock all clocks at once. This makes life easier for high-res timers and dyntick. No functional changes. [ optimization change from Andrew Morton <akpm@osdl.org> ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Thomas Gleixner	c9cb2e3d7c	[PATCH] hrtimers: namespace and enum cleanup - hrtimers did not use the hrtimer_restart enum and relied on the implict int representation. Fix the prototypes and the functions using the enums. - Use seperate name spaces for the enumerations - Convert hrtimer_restart macro to inline function - Add comments No functional changes. [akpm@osdl.org: fix input driver] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Thomas Gleixner	fd064b9b77	[PATCH] Extend next_timer_interrupt() to use a reference jiffie For CONFIG_NO_HZ we need to calculate the next timer wheel event based on a given jiffie value. Extend the existing code to allow the extra 'now' argument. Provide a compability function for the existing implementations to call the function with now == jiffies. (This also solves the racyness of the original code vs. jiffies changing during the iteration.) No functional changes to existing users of this infrastructure. [ remove WARN_ON() that triggered on s390, by Carsten Otte <cotte@de.ibm.com> ] [ made new helper static, Adrian Bunk <bunk@stusta.de> ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Thomas Gleixner	1cfd68496e	[PATCH] Fix cascade lookup of next_timer_interrupt When searching for the next pending timer in the timer wheel we need to take the cascade into account. The current code has several problems: 1. it looks into the previous cascade 2. it ignores a pending cascade 3. it ignores multiple cascades Change the cascade lookup, so it calculates the array index from the point of the next cascade and always look at the cascade buckets, when the cascade is pending, i.e. gets executed in the next timer softirq. When multiple cascades are pending, then lookup the next buckets too. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Ingo Molnar	dde4b2b5f4	[PATCH] uninline irq_enter() Uninline irq_enter(). [dynticks adds more stuff to it] No functional changes. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Thomas Gleixner	5d8b34fdcb	[PATCH] clocksource: Add verification (watchdog) helper The TSC needs to be verified against another clocksource. Instead of using hardwired assumptions of available hardware, provide a generic verification mechanism. The verification uses the best available clocksource and handles the usability for high resolution timers / dynticks of the clocksource which needs to be verified. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:57 -08:00
Thomas Gleixner	7e69f2b1ea	[PATCH] clocksource: Remove the update callback The clocksource code allows direct updates of the rating of a given clocksource now. Change TSC unstable tracking to use this interface and remove the update callback. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:57 -08:00
Thomas Gleixner	73b08d2aa4	[PATCH] clocksource: replace is_continuous by a flag field Using a flag filed allows to encode more than one information into a variable. Preparatory patch for the generic clocksource verification. [mingo@elte.hu: convert vmitime.c to the new clocksource flag] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:57 -08:00
Thomas Gleixner	92c7e00254	[PATCH] Simplify the registration of clocksources Enqueue clocksources in rating order to make selection of the clocksource easier. Also check the match with an user override at enqueue time. Preparatory patch for the generic clocksource verification. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:57 -08:00
John Stultz	411187fb05	[PATCH] GTOD: persistent clock support Persistent clock support: do proper timekeeping across suspend/resume. [bunk@stusta.de: cleanup] Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:57 -08:00
Ingo Molnar	41cf54455d	[PATCH] Fix multiple conversion bugs in msecs_to_jiffies Fix multiple conversion bugs in msecs_to_jiffies(). The main problem is that this condition: if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET)) overflows if HZ is smaller than 1000! This change is user-visible: for HZ=250 SUS-compliant poll()-timeout value of -20 is mistakenly converted to 'immediate timeout'. (The new dyntick code also triggered this, that's how we noticed.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:56 -08:00
Ingo Molnar	8b9365d753	[PATCH] Uninline jiffies.h functions There are loads of fat functions hidden in jiffies.h. Uninline them. No code changes. [jeremy@goop.org: export fix] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:56 -08:00
john stultz	f4304ab215	[PATCH] HZ free ntp Distangle the NTP update from HZ. This is necessary for dynamic tick enabled kernels. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:56 -08:00
Thomas Gleixner	771ee3b04e	[PATCH] Add a function to handle interrupt affinity setting Provide funtions to: - check, whether an interrupt can set the affinity - pin the interrupt to a given cpu Necessary for the ability to setup clocksources more flexible (e.g. use the different HPET channels per CPU) [akpm@osdl.org: alpha build fix] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:56 -08:00
Thomas Gleixner	950f4427c2	[PATCH] Add irq flag to disable balancing for an interrupt Add a flag so we can prevent the irq balancing of an interrupt. Move the bits, so we have room for more :) Necessary for the ability to setup clocksources more flexible (e.g. use the different HPET channels per CPU) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:56 -08:00
Linus Torvalds	414f827c46	Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (94 commits) [PATCH] x86-64: Remove mk_pte_phys() [PATCH] i386: Fix broken CONFIG_COMPAT_VDSO on i386 [PATCH] i386: fix 32-bit ioctls on x64_32 [PATCH] x86: Unify pcspeaker platform device code between i386/x86-64 [PATCH] i386: Remove extern declaration from mm/discontig.c, put in header. [PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c [PATCH] i386: Move mce_disabled to asm/mce.h [PATCH] i386: paravirt unhandled fallthrough [PATCH] x86_64: Wire up compat epoll_pwait [PATCH] x86: Don't require the vDSO for handling a.out signals [PATCH] i386: Fix Cyrix MediaGX detection [PATCH] i386: Fix warning in cpu initialization [PATCH] i386: Fix warning in microcode.c [PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs [PATCH] x86: Add new CPUID bits for AMD Family 10 CPUs in /proc/cpuinfo [PATCH] i386: Remove fastcall in paravirt.[ch] [PATCH] x86-64: Fix wrong gcc check in bitops.h [PATCH] x86-64: survive having no irq mapping for a vector [PATCH] i386: geode configuration fixes [PATCH] i386: add option to show more code in oops reports ...	2007-02-14 09:46:06 -08:00
Eric W. Biederman	d912b0cc1a	[PATCH] sysctl: add a parent entry to ctl_table and set the parent entry Add a parent entry into the ctl_table so you can walk the list of parents and find the entire path to a ctl_table entry. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:10:00 -08:00
Eric W. Biederman	77b14db502	[PATCH] sysctl: reimplement the sysctl proc support With this change the sysctl inodes can be cached and nothing needs to be done when removing a sysctl table. For a cost of 2K code we will save about 4K of static tables (when we remove de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or about half that on a 32bit arch. The speed feels about the same, even though we can now cache the sysctl dentries :( We get the core advantage that we don't need to have a 1 to 1 mapping between ctl table entries and proc files. Making it possible to have /proc/sys vary depending on the namespace you are in. The currently merged namespaces don't have an issue here but the network namespace under /proc/sys/net needs to have different directories depending on which network adapters are visible. By simply being a cache different directories being visible depending on who you are is trivial to implement. [akpm@osdl.org: fix uninitialised var] [akpm@osdl.org: fix ARM build] [bunk@stusta.de: make things static] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:10:00 -08:00
Eric W. Biederman	1ff007eb8e	[PATCH] sysctl: allow sysctl_perm to be called from outside of sysctl.c Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	805b5d5e06	[PATCH] sysctl: factor out sysctl_head_next from do_sysctl The current logic to walk through the list of sysctl table headers is slightly painful and implement in a way it cannot be used by code outside sysctl.c I am in the process of implementing a version of the sysctl proc support that instead of using the proc generic non-caching monster, just uses the existing sysctl data structure as backing store for building the dcache entries and for doing directory reads. To use the existing data structures however I need a way to get at them. [akpm@osdl.org: warning fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	0b4d414714	[PATCH] sysctl: remove insert_at_head from register_sysctl The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	ae83681026	[PATCH] sysctl: remove support for directory strategy routines parse_table has support for calling a strategy routine when descending into a directory. To date no one has used this functionality and the /proc/sys interface has no analog to it. So no one is using this functionality kill it and make the binary sysctl code easier to follow. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	6703ddfcce	[PATCH] sysctl: remove support for CTL_ANY There are currently no users in the kernel for CTL_ANY and it only has effect on the binary interface which is practically unused. So this complicates sysctl lookups for no good reason so just remove it. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	2abc26fc6b	[PATCH] sysctl: create sys/fs/binfmt_misc as an ordinary sysctl entry binfmt_misc has a mount point in the middle of the sysctl and that mount point is created as a proc_generic directory. Doing it that way gets in the way of cleaning up the sysctl proc support as it continues the existence of a horrible hack. So instead simply create the directory as an ordinary sysctl directory. At least that removes the magic special case. [akpm@osdl.org: warning fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	a5494dcd8b	[PATCH] sysctl: move SYSV IPC sysctls to their own file This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded with special cases, and by keeping all of the ipc logic to together it makes the code a little more readable. [gcoady.lk@gmail.com: build fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Grant Coady <gcoady.lk@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	39732acd96	[PATCH] sysctl: move utsname sysctls to their own file This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded with special cases, and by keeping all of the utsname logic to together it makes the code a little more readable. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:58 -08:00
Eric W. Biederman	b04c3afb2b	[PATCH] sysctl: move init_irq_proc into init/main where it belongs Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:58 -08:00
Thomas Gleixner	38515e908b	[PATCH] Scheduled removal of SA_xxx interrupt flags fixups The obsolete SA_xxx interrupt flags have been used despite the scheduled removal. Fixup the remaining users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: Wim Van Sebroeck <wim@iguana.be> Cc: Roland Dreier <rolandd@cisco.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Cc: Dave Airlie <airlied@linux.ie> Cc: James Simmons <jsimmons@infradead.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:54 -08:00
Tim Schmielau	cd354f1ae7	[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:54 -08:00
Andi Kleen	a98f0dd34d	[PATCH] x86-64: Allow to run a program when a machine check event is detected When a machine check event is detected (including a AMD RevF threshold overflow event) allow to run a "trigger" program. This allows user space to react to such events sooner. The trigger is configured using a new trigger entry in the machinecheck sysfs interface. It is currently shared between all CPUs. I also fixed the AMD threshold handler to run the machine check polling code immediately to actually log any events that might have caused the threshold interrupt. Also added some documentation for the mce sysfs interface. Signed-off-by: Andi Kleen <ak@suse.de>	2007-02-13 13:26:23 +01:00
Zachary Amsden	9226d125d9	[PATCH] i386: paravirt CPU hypercall batching mode The VMI ROM has a mode where hypercalls can be queued and batched. This turns out to be a significant win during context switch, but must be done at a specific point before side effects to CPU state are visible to subsequent instructions. This is similar to the MMU batching hooks already provided. The same hooks could be used by the Xen backend to implement a context switch multicall. To explain a bit more about lazy modes in the paravirt patches, basically, the idea is that only one of lazy CPU or MMU mode can be active at any given time. Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of multiple PTE updates (say, inside a remap loop), but to avoid keeping some kind of state machine about when to flush cpu or mmu updates, we just allow one or the other to be active. Although there is no real reason a more comprehensive scheme could not be implemented, there is also no demonstrated need for this extra complexity. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-02-13 13:26:21 +01:00
Eric Dumazet	5809f9d442	[PATCH] x86-64: get rid of ARCH_HAVE_XTIME_LOCK ARCH_HAVE_XTIME_LOCK is used by x86_64 arch . This arch needs to place a read only copy of xtime_lock into vsyscall page. This read only copy is named __xtime_lock, and xtime_lock is defined in arch/x86_64/kernel/vmlinux.lds.S as an alias. So the declaration of xtime_lock in kernel/timer.c was guarded by ARCH_HAVE_XTIME_LOCK define, defined to true on x86_64. We can get same result with _attribute__((weak)) in the declaration. linker should do the job. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-02-13 13:26:21 +01:00
Arjan van de Ven	92e1d5be91	[PATCH] mark struct inode_operations const 2 Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Arjan van de Ven	9a32144e9d	[PATCH] mark struct file_operations const 7 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Nick Piggin	ff91691bcc	[PATCH] sched: avoid div in rebalance_tick Avoid expensive integer divide 3 times per CPU per tick. A userspace test of this loop went from 26ns, down to 19ns on a G5; and from 123ns down to 28ns on a P3. (Also avoid a variable bit shift, as suggested by Alan. The effect of this wasn't noticable on the CPUs I tested with). Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:37 -08:00
Eric W. Biederman	27b0b2f44a	[PATCH] pid: remove the now unused kill_pg kill_pg_info and __kill_pg_info Now that I have changed all of the in-tree users remove the old version of these functions. This should make it clear to any out of tree users that they should be using kill_pgrp kill_pgrp_info or __kill_pgrp_info instead. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Eric W. Biederman	41487c65bf	[PATCH] pid: replace do/while_each_task_pid with do/while_each_pid_task There isn't any real advantage to this change except that it allows the old functions to be removed. Which is easier on maintenance and puts the code in a more uniform style. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Eric W. Biederman	ab521dc0f8	[PATCH] tty: update the tty layer to work with struct pid Of kernel subsystems that work with pids the tty layer is probably the largest consumer. But it has the nice virtue that the assiation with a session only lasts until the session leader exits. Which means that no reference counting is required. So using struct pid winds up being a simple optimization to avoid hash table lookups. In the long term the use of pid_nr also ensures that when we have multiple pid spaces mixed everything will work correctly. Signed-off-by: Eric W. Biederman <eric@maxwell.lnxi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Eric W. Biederman	3e7cd6c413	[PATCH] pid: replace is_orphaned_pgrp with is_current_pgrp_orphaned Every call to is_orphaned_pgrp passed in process_group(current) which is racy with respect to another thread changing our process group. It didn't bite us because we were dealing with integers and the worse we would get would be a stale answer. In switching the checks to use struct pid to be a little more efficient and prepare the way for pid namespaces this race became apparent. So I simplified the calls to the more specialized is_current_pgrp_orphaned so I didn't have to worry about making logic changes to avoid the race. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Eric W. Biederman	0475ac0845	[PATCH] pid: use struct pid for talking about process groups in exitc Modify has_stopped_jobs and will_become_orphan_pgrp to use struct pid based process groups. This reduces the number of hash tables looks ups and paves the way for multiple pid spaces. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Eric W. Biederman	04a2e6a5cb	[PATCH] pid: make session_of_pgrp use struct pid instead of pid_t To properly implement a pid namespace I need to deal exclusively in terms of struct pid, because pid_t values become ambiguous. To this end session_of_pgrp is transformed to take and return a struct pid pointer. To avoid the need to worry about reference counting I now require my caller to hold the appropriate locks. Leaving callers repsonsible for increasing the reference count if they need access to the result outside of the locks. Since session_of_pgrp currently only has one caller and that caller simply uses only test the result for equality with another process group, the locking change means I don't actually have to acquire the tasklist_lock at all. tiocspgrp is also modified to take and release the lock. The logic there is a little more complicated but nothing I won't need when I convert pgrp of a tty to a struct pid pointer. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:31 -08:00
Eric W. Biederman	8d42db189c	[PATCH] signal: rewrite kill_something_info so it uses newer helpers The goal is to remove users of the old signal helper functions so they can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:31 -08:00
Ingo Molnar	944be0b224	[PATCH] close_files(): add scheduling point close_files() can sometimes take long enough to trigger the soft lockup detector. Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:30 -08:00
Alan Cox	3f05044715	[PATCH] kernel: shut up the IRQ mismatch messages The problem is various drivers legally validly and sensibly try to claim IRQs but the kernel insists on vomiting forth a giant irrelevant debugging spew when the types clash. Edit kernel/irq/manage.c go down to mismatch: in setup_irq() and ifdef out the if clause that checks for mismatches. It'll then just do the right thing and work sanely. For the current -mm kernel this will do the trick (and moves it into shared irq debugging as in debug mode the info spew is useful). I've had a variant of this in my private tree for some time as I got fed up on the mess on boxes where old legacy IRQs get reused. Signed-off-by: Alan Cox <alan@redhat.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:28 -08:00
David Woodhouse	a304e1b828	[PATCH] Debug shared irqs Drivers registering IRQ handlers with SA_SHIRQ really ought to be able to handle an interrupt happening before request_irq() returns. They also ought to be able to handle an interrupt happening during the start of their call to free_irq(). Let's test that hypothesis.... [bunk@stusta.de: Kconfig fixes] Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:28 -08:00
Al Viro	5ea8176994	[PATCH] sort the devres mess out * Split the implementation-agnostic stuff in separate files. * Make sure that targets using non-default request_irq() pull kernel/irq/devres.o * Introduce new symbols (HAS_IOPORT and HAS_IOMEM) defaulting to positive; allow architectures to turn them off (we needed these symbols anyway for dependencies of quite a few drivers). * protect the ioport-related parts of lib/devres.o with CONFIG_HAS_IOPORT. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:07 -08:00
Alexey Dobriyan	4b98d11b40	[PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct They are fat: 4x8 bytes in task_struct. They are uncoditionally updated in every fork, read, write and sendfile. They are used only if you have some "extended acct fields feature". And please, please, please, read(2) knows about bytes, not characters, why it is called "rchar"? Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:07 -08:00
Oleg Nesterov	8d06087714	[PATCH] _proc_do_string(): fix short reads If you try to read things like /proc/sys/kernel/osrelease with single-byte reads, you get just one byte and then EOF. This is because _proc_do_string() assumes that the caller is read()ing into a buffer which is large enough to fit the whole string in a single hit. Fix. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:07 -08:00
Robert P. J. Day	501b9ebf43	[PATCH] Fix apparent typo CONFIG_LOCKDEP_DEBUG Replace the apparent typo CONFIG_LOCKDEP_DEBUG with the correct CONFIG_DEBUG_LOCKDEP. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:06 -08:00
Mathieu Desnoyers	1efc5da3cf	[PATCH] order of lockdep off/on in vprintk() should be changed The order of locking between lockdep_off/on() and local_irq_save/restore() in vprintk() should be changed. * In kernel/printk.c : vprintk() does : preempt_disable() local_irq_save() lockdep_off() spin_lock(&logbuf_lock) spin_unlock(&logbuf_lock) if(!down_trylock(&console_sem)) up(&console_sem) lockdep_on() local_irq_restore() preempt_enable() The goals here is to make sure we do not call printk() recursively from kernel/lockdep.c:__lock_acquire() (called from spin_* and down/up) nor from kernel/lockdep.c:trace_hardirqs_on/off() (called from local_irq_restore/save). It can then potentially call printk() through mark_held_locks/mark_lock. It correctly protects against the spin_lock call and the up/down call, but it does not protect against local_irq_restore. It could cause infinite recursive printk/trace_hardirqs_on() calls when printk() is called from the mark_lock() error handing path. We should change the locking so it becomes correct : preempt_disable() lockdep_off() local_irq_save() spin_lock(&logbuf_lock) spin_unlock(&logbuf_lock) if(!down_trylock(&console_sem)) up(&console_sem) local_irq_restore() lockdep_on() preempt_enable() Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:06 -08:00
Kirill Korotaev	e3e8a75d2a	[PATCH] Extract and use wake_up_klogd() Remove hack with printing space to wake up klogd. Use explicit wake_up_klogd(). See earlier discussion http://groups.google.com/group/fa.linux.kernel/browse_frm/thread/75f496668409f58d/1a8f28983a51e1ff?lnk=st&q=wake_up_klogd+group%3Afa.linux.kernel&rnum=2#1a8f28983a51e1ff Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Ingo Molnar	11f57cedcf	[PATCH] audit: fix audit_filter_user_rules() initialization bug gcc emits this warning: kernel/auditfilter.c: In function 'audit_filter_user': kernel/auditfilter.c:1611: warning: 'state' is used uninitialized in this function I tend to agree with gcc - there are a couple of plausible exit paths from audit_filter_user_rules() where it does not set 'state', keeping the variable uninitialized. For example if a filter rule has an AUDIT_POSSIBLE action. Initialize to 'wont audit'. Fix whitespace damage too. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Kyle McMartin	d4d23add3a	[PATCH] Common compat_sys_sysinfo I noticed that almost all architectures implemented exactly the same sys32_sysinfo... except parisc, where a bug was to be found in handling of the uptime. So let's remove a whole whack of code for fun and profit. Cribbed compat_sys_sysinfo from x86_64's implementation, since I figured it would be the best tested. This patch incorporates Arnd's suggestion of not using set_fs/get_fs, but instead extracting out the common code from sys_sysinfo. Cc: Christoph Hellwig <hch@infradead.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:32 -08:00
Robert P. J. Day	72fd4a35a8	[PATCH] Numerous fixes to kernel-doc info in source files. A variety of (mostly) innocuous fixes to the embedded kernel-doc content in source files, including: * make multi-line initial descriptions single line * denote some function names, constants and structs as such * change erroneous opening '/' to '/' in a few places reword some text for clarity Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:32 -08:00
Alexey Dobriyan	b653d081c1	[PATCH] proc: remove useless (and buggy) ->nlink settings Bug: pnx8550 code creates directory but resets ->nlink to 1. create_proc_entry() et al will correctly set ->nlink for you. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Corey Minyard <minyard@acm.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Greg KH <greg@kroah.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:32 -08:00
Andrew Morton	cb799b8988	[PATCH] sysctl warning fix kernel/sysctl.c:2816: warning: 'sysctl_ipc_data' defined but not used Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:31 -08:00
Helge Deller	3db5db4fcd	[PATCH] use cycle_t instead of u64 in struct time_interpolator The 32bit and 64bit PARISC Linux kernels suffers from the problem, that the gettimeofday() call sometimes returns non-monotonic times. The easiest way to fix this, is to drop the PARISC-specific implementation and switch over to the generic TIME_INTERPOLATION framework. But in order to make it even compile on 32bit PARISC, the patch below which touches the generic Linux code, is mandatory. More information and the full patch with the parisc-specific changes is included in this thread: http://lists.parisc-linux.org/pipermail/parisc-linux/2006-December/031003.html As far as I could see, this patch does not change anything for the existing architectures which use this framework (IA64 and SPARC64), since "cycles_t" is defined there as unsigned 64bit-integer anyway (which then makes this patch a no-change for them). Signed-off-by: Helge Deller <deller@gmx.de> Cc: <linux-arch@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:31 -08:00
Theodore Ts'o	34f5a39899	[PATCH] Add TAINT_USER and ability to set taint flags from userspace Allow taint flags to be set from userspace by writing to /proc/sys/kernel/tainted, and add a new taint flag, TAINT_USER, to be used when userspace has potentially done something dangerous that might compromise the kernel. This will allow support personnel to ask further questions about what may have caused the user taint flag to have been set. For example, they might examine the logs of the realtime JVM to see if the Java program has used the really silly, stupid, dangerous, and completely-non-portable direct access to physical memory feature which MUST be implemented according to the Real-Time Specification for Java (RTSJ). Sigh. What were those silly people at Sun thinking? [akpm@osdl.org: build fix] [bunk@stusta.de: cleanup] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:29 -08:00
Alexey Dobriyan	b035b6de24	[PATCH] Consolidate default sched_clock() Use attribute(weak). Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:28 -08:00
Mathieu Desnoyers	23c887522e	[PATCH] Relay: add CPU hotplug support Mathieu originally needed to add this for tracing Xen, but it's something that's needed for any application that can be tracing while cpus are added. unplug isn't supported by this patch. The thought was that at minumum a new buffer needs to be added when a cpu comes up, but it wasn't worth the effort to remove buffers on cpu down since they'd be freed soon anyway when the channel was closed. [zanussi@us.ibm.com: avoid lock_cpu_hotplug deadlock] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:28 -08:00
Robert P. J. Day	c376222960	[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:27 -08:00
Jason Baron	068135e635	[PATCH] lockdep: add graph depth information to /proc/lockdep Generate locking graph information into /proc/lockdep, for lock hierarchy documentation and visualization purposes. sample output: c089fd5c OPS: 138 FD: 14 BD: 1 --..: &tty->termios_mutex -> [c07a3430] tty_ldisc_lock -> [c07a37f0] &port_lock_key -> [c07afdc0] &rq->rq_lock_key#2 The lock classes listed are all the first-hop lock dependencies that lockdep has seen so far. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:26 -08:00
Jarek Poplawski	381a229209	[PATCH] lockdep: more unlock-on-error fixes - returns after DEBUG_LOCKS_WARN_ON added in 3 places - debug_locks checking after lookup_chain_cache() added in __lock_acquire() - locking for testing and changing global variable max_lockdep_depth added in __lock_acquire() From: Ingo Molnar <mingo@elte.hu> My __acquire_lock() cleanup introduced a locking bug: on SMP systems we'd release a non-owned graph lock. Fix this by moving the graph unlock back, and by leaving the max_lockdep_depth variable update possibly racy. (we dont care, it's just statistics) Also add some minimal debugging code to graph_unlock()/graph_lock(), which caught this locking bug. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:26 -08:00
Oleg Nesterov	0c12b51712	[PATCH] kill_pid_info: kill acquired_tasklist_lock Kill acquired_tasklist_lock, sig_needs_tasklist() is very cheap nowadays. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:26 -08:00
Alexey Dobriyan	3ee75ac3c0	[PATCH] sysctl_{,ms_}jiffies: fix oldlen semantics currently it's 1) if oldlenp == 0, don't writeback anything 2) if oldlenp >= table->maxlen, don't writeback more than table->maxlen bytes and rewrite oldlenp don't look at underlying type granularity 3) if 0 < oldlenp < table->maxlen, cough string sysctls don't writeback more than oldlenp bytes. OK, that's because sizeof(char) == 1 int sysctls writeback anything in (0, table->maxlen] range Though accept integers divisible by sizeof(int) for writing. sysctl_jiffies and sysctl_ms_jiffies don't writeback anything but sizeof(int), which violates 1) and 2). So, make sysctl_jiffies and sysctl_ms_jiffies accept a) oldlenp == 0, not doing writeback b) oldlenp >= sizeof(int), writing one integer. -EINVAL still returned for oldlenp == 1, 2, 3. Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:24 -08:00
Mathieu Desnoyers	dc29a3657b	[PATCH] kernel/time/clocksource.c needs struct task_struct on m68k kernel/time/clocksource.c needs struct task_struct on m68k. Because it uses spin_unlock_irq(), which, on m68k, uses hardirq_count(), which uses preempt_count(), which needs to dereference struct task_struct, we have to include sched.h. Because it would cause a loop inclusion, we cannot include sched.h in any other of asm-m68k/system.h, linux/thread_info.h, linux/hardirq.h, which leaves this ugly include in a C file as the only simple solution. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:20 -08:00
Rafael J. Wysocki	2b5b09b3b5	[PATCH] swsusp: Change pm_ops handling by userland interface Make the userland interface of swsusp call pm_ops->finish() after enable_nonboot_cpus() and before resume_device(), as indicated by the recent discussion on Linux-PM (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). This patch changes the SNAPSHOT_PMOPS ioctl so that its first function, PMOPS_PREPARE, only sets a switch turning the platform suspend mode on, and its last function, PMOPS_FINISH, only checks if the platform mode is enabled. This should allow the older userland tools to work with new kernels without any modifications. The changes here only affect the userland interface of swsusp. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:20 -08:00
Andrew Morton	d12c610e08	[PATCH] swsusp-change-code-ordering-in-userc-sanity The compiler will do that. And if it doesn't, we don't want to either ;) Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:19 -08:00
Rafael J. Wysocki	259130526c	[PATCH] swsusp: Change code ordering in user.c Change the ordering of code in kernel/power/user.c so that device_suspend() is called before disable_nonboot_cpus() and device_resume() is called after enable_nonboot_cpus(). This is needed to make the userland suspend call pm_ops->finish() after enable_nonboot_cpus() and before device_resume(), as indicated by the recent discussion on Linux-PM (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). The changes here only affect the userland interface of swsusp. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:19 -08:00
Rafael J. Wysocki	ed746e3b18	[PATCH] swsusp: Change code ordering in disk.c Change the ordering of code in kernel/power/disk.c so that device_suspend() is called before disable_nonboot_cpus() and platform_finish() is called after enable_nonboot_cpus() and before device_resume(), as indicated by the recent discussion on Linux-PM (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). The changes here only affect the built-in swsusp. [alexey.y.starikovskiy@linux.intel.com: fix LED blinking during image load] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Cc: Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:19 -08:00
Rafael J. Wysocki	e3c7db621b	[PATCH] PM: Change code ordering in main.c As indicated in a recent thread on Linux-PM, it's necessary to call pm_ops->finish() before devce_resume(), but enable_nonboot_cpus() has to be called before pm_ops->finish() (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). For consistency, it seems reasonable to call disable_nonboot_cpus() after device_suspend(). This way the suspend code will remain symmetrical with respect to the resume code and it may allow us to speed up things in the future by suspending and resuming devices and/or saving the suspend image in many threads. The following series of patches reorders the suspend and resume code so that nonboot CPUs are disabled after devices have been suspended and enabled before the devices are resumed. It also causes pm_ops->finish() to be called after enable_nonboot_cpus() wherever necessary. This patch: Change the ordering of code in kernel/power/main.c so that device_suspend() is called before disable_nonboot_cpus() and pm_ops->finish() is called after enable_nonboot_cpus() and before device_resume(), as indicated by recent discussion on Linux-PM (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:19 -08:00
Eric Paris	6ff1b4426e	[PATCH] make reading /proc/sys/kernel/cap-bould not require CAP_SYS_MODULE Reading /proc/sys/kernel/cap-bound requires CAP_SYS_MODULE. (see proc_dointvec_bset in kernel/sysctl.c) sysctl appears to drive all over proc reading everything it can get it's hands on and is complaining when it is being denied access to read cap-bound. Clearly writing to cap-bound should be a sensitive operation but requiring CAP_SYS_MODULE to read cap-bound seems a bit to strong. I believe the information could with reasonable certainty be obtained by looking at a bunch of the output of /proc/pid/status which has very low security protection, so at best we are just getting a little obfuscation of information. Currently SELinux policy has to 'dontaudit' capability checks for CAP_SYS_MODULE for things like sysctl which just want to read cap-bound. In doing so we also as a byproduct have to hide warnings of potential exploits such as if at some time that sysctl actually tried to load a module. I wondered if anyone would have a problem opening cap-bound up to read from anyone? Acked-by: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:19 -08:00
Christoph Lameter	9617729941	[PATCH] Drop free_pages() nr_free_pages is now a simple access to a global variable. Make it a macro instead of a function. The nr_free_pages now requires vmstat.h to be included. There is one occurrence in power management where we need to add the include. Directly refrer to global_page_state() there to clarify why the #include was added. [akpm@osdl.org: arm build fix] [akpm@osdl.org: sparc64 build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:18 -08:00
Christoph Lameter	d23ad42324	[PATCH] Use ZVC for free_pages This is again simplifies some of the VM counter calculations through the use of the ZVC consolidated counters. [michal.k.k.piotrowski@gmail.com: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:17 -08:00
Tejun Heo	9ac7849e35	devres: device resource management Implement device resource management, in short, devres. A device driver can allocate arbirary size of devres data which is associated with a release function. On driver detach, release function is invoked on the devres data, then, devres data is freed. devreses are typed by associated release functions. Some devreses are better represented by single instance of the type while others need multiple instances sharing the same release function. Both usages are supported. devreses can be grouped using devres group such that a device driver can easily release acquired resources halfway through initialization or selectively release resources (e.g. resources for port 1 out of 4 ports). This patch adds devres core including documentation and the following managed interfaces. * alloc/free : devm_kzalloc(), devm_kzfree() * IO region : devm_request_region(), devm_release_region() * IRQ : devm_request_irq(), devm_free_irq() * DMA : dmam_alloc_coherent(), dmam_free_coherent(), dmam_declare_coherent_memory(), dmam_pool_create(), dmam_pool_destroy() * PCI : pcim_enable_device(), pcim_pin_device(), pci_is_managed() * iomap : devm_ioport_map(), devm_ioport_unmap(), devm_ioremap(), devm_ioremap_nocache(), devm_iounmap(), pcim_iomap_table(), pcim_iomap(), pcim_iounmap() Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-02-09 17:39:36 -05:00
Ralf Baechle	7726942fb1	[APM] Add shared version of APM emulation Currently ARM and MIPS both have nearly identical copies of the APM emulation code in their arch code. Add yet another copy of it to drivers char and make it selectable through SYS_SUPPORTS_APM_EMULATION. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2007-02-09 17:08:57 +00:00
Linus Torvalds	78149df6d5	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (41 commits) Revert "PCI: remove duplicate device id from ata_piix" msi: Make MSI useable more architectures msi: Kill the msi_desc array. msi: Remove attach_msi_entry. msi: Fix msi_remove_pci_irq_vectors. msi: Remove msi_lock. msi: Kill msi_lookup_irq MSI: Combine pci_(save\|restore)_msi/msix_state MSI: Remove pci_scan_msi_device() MSI: Replace pci_msi_quirk with calls to pci_no_msi() PCI: remove duplicate device id from ipr PCI: remove duplicate device id from ata_piix PCI: power management: remove noise on non-manageable hw PCI: cleanup MSI code PCI: make isa_bridge Alpha-only PCI: remove quirk_sis_96x_compatible() PCI: Speed up the Intel SMBus unhiding quirk PCI Quirk: 1k I/O space IOBL_ADR fix on P64H2 shpchp: delete trailing whitespace shpchp: remove DBG_XXX_ROUTINE ...	2007-02-07 19:23:44 -08:00
Eric W. Biederman	5b912c108c	msi: Kill the msi_desc array. We need to be able to get from an irq number to a struct msi_desc. The msi_desc array in msi.c had several short comings the big one was that it could not be used outside of msi.c. Using irq_data in struct irq_desc almost worked except on some architectures irq_data needs to be used for something else. So this patch adds a msi_desc pointer to irq_desc, adds the appropriate wrappers and changes all of the msi code to use them. The dynamic_irq_init/cleanup code was tweaked to ensure the new field is left in a well defined state. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-07 15:50:08 -08:00
Kay Sievers	270a6c4cad	/sys/modules/*/holders /sys/module/usbcore/ \|-- drivers \| \|-- usb:hub -> ../../../subsystem/usb/drivers/hub \| \|-- usb:usb -> ../../../subsystem/usb/drivers/usb \| `-- usb:usbfs -> ../../../subsystem/usb/drivers/usbfs \|-- holders \| \|-- ehci_hcd -> ../../../module/ehci_hcd \| \|-- uhci_hcd -> ../../../module/uhci_hcd \| \|-- usb_storage -> ../../../module/usb_storage \| `-- usbhid -> ../../../module/usbhid \|-- initstate Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-07 10:37:12 -08:00
Greg Kroah-Hartman	fe480a2675	Modules: only add drivers/ direcory if needed This changes the module core to only create the drivers/ directory if we are going to put something in it. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-07 10:37:12 -08:00
Kay Sievers	f30c53a873	MODULES: add the module name for built in kernel drivers Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-07 10:37:12 -08:00
Al Viro	9abcf40b1d	[PATCH] fork_idle() should be __cpuinit, not __devinit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-01 16:17:06 -08:00
Masami Hiramatsu	ab40c5c6b6	[PATCH] kprobes: replace magic numbers with enum Replace the magic numbers with an enum, and gets rid of a warning on the specific architectures (ex. powerpc) on which the compiler considers 'char' as 'unsigned char'. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-30 16:01:35 -08:00
Serge E. Hallyn	0f2452855d	[PATCH] namespaces: fix task exit disaster This is based on a patch by Eric W. Biederman, who pointed out that pid namespaces are still fake, and we only have one ever active. So for the time being, we can modify any code which could access tsk->nsproxy->pid_ns during task exit to just use &init_pid_ns instead, and move the exit_task_namespaces call in do_exit() back above exit_notify(), so that an exiting nfs server has a valid tsk->sighand to work with. Long term, pulling pid_ns out of nsproxy might be the cleanest solution. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> [ Eric's patch fixed to take care of free_pid() too ] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-30 13:40:36 -08:00
Linus Torvalds	444f378b23	Revert "[PATCH] namespaces: fix exit race by splitting exit" This reverts commit `7a238fcba0` in preparation for a better and simpler fix proposed by Eric Biederman (and fixed up by Serge Hallyn) Acked-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-30 13:35:18 -08:00
Serge E. Hallyn	7a238fcba0	[PATCH] namespaces: fix exit race by splitting exit Fix exit race by splitting the nsproxy putting into two pieces. First piece reduces the nsproxy refcount. If we dropped the last reference, then it puts the mnt_ns, and returns the nsproxy as a hint to the caller. Else it returns NULL. The second piece of exiting task namespaces sets tsk->nsproxy to NULL, and drops the references to other namespaces and frees the nsproxy only if an nsproxy was passed in. A little awkward and should probably be reworked, but hopefully it fixes the NFS oops. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Daniel Hokka Zakrisson <daniel@hozac.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-30 08:26:44 -08:00
Linus Torvalds	8528b0f1de	Clear spurious irq stat information when adding irq handler Any newly added irq handler may obviously make any old spurious irq status invalid, since the new handler may well be the thing that is supposed to handle any interrupts that came in. So just clear the statistics when adding handlers. Pointed-out-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-23 14:16:31 -08:00
Ingo Molnar	1b5180b651	[PATCH] notifiers: fix blocking_notifier_call_chain() scalability while lock-profiling the -rt kernel i noticed weird contention during mmap-intense workloads, and the tracer showed the following gem, in one of our MM hotpaths: threaded-2771 1.... 65us : sys_munmap (sysenter_do_call) threaded-2771 1.... 66us : profile_munmap (sys_munmap) threaded-2771 1.... 66us : blocking_notifier_call_chain (profile_munmap) threaded-2771 1.... 66us : rt_down_read (blocking_notifier_call_chain) ouch! a global rw-semaphore taken in one of the most performance- sensitive codepaths of the kernel. And i dont even have oprofile enabled! All distro kernels have CONFIG_PROFILING enabled, so this scalability problem affects the majority of Linux users. The fix is to enhance blocking_notifier_call_chain() to only take the lock if there appears to be work on the call-chain. With this patch applied i get nicely saturated system, and much higher munmap performance, on SMP systems. And as a bonus this also fixes a similar scalability bottleneck in the thread-exit codepath: profile_task_exit() ... Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-23 11:08:03 -08:00
Andrew Morton	bbe1a59b3a	[PATCH] fix "kvm: add vm exit profiling" export profile_hits() on !SMP too. Cc: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-23 07:52:05 -08:00
Ingo Molnar	07031e14c1	[PATCH] KVM: add VM-exit profiling This adds the profile=kvm boot option, which enables KVM to profile VM exits. Use: "readprofile -m ./System.map \| sort -n" to see the resulting output: [...] 18246 serial_out 148.3415 18945 native_flush_tlb 378.9000 23618 serial_in 212.7748 29279 __spin_unlock_irq 622.9574 43447 native_apic_write 2068.9048 52702 enable_8259A_irq 742.2817 54250 vgacon_scroll 89.3740 67394 ide_inb 6126.7273 79514 copy_page_range 98.1654 84868 do_wp_page 86.6000 140266 pit_read 783.6089 151436 ide_outb 25239.3333 152668 native_io_delay 21809.7143 174783 mask_and_ack_8259A 783.7803 362404 native_set_pte_at 36240.4000 1688747 total 0.5009 Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-11 18:18:21 -08:00
Gautham R Shenoy	b282b6f8a8	[PATCH] Change cpu_up and co from __devinit to __cpuinit Compiling the kernel with CONFIG_HOTPLUG = y and CONFIG_HOTPLUG_CPU = n with CONFIG_RELOCATABLE = y generates the following modpost warnings WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141b7d) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141b9c) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.text:__cpu_up from .text between '_cpu_up' (at offset 0xc0141bd8) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141c05) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141c26) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141c37) and 'cpu_up' This is because cpu_up, _cpu_up and __cpu_up (in some architectures) are defined as __devinit AND __cpu_up calls some __cpuinit functions. Since __cpuinit would map to __init with this kind of a configuration, we get a .text refering .init.data warning. This patch solves the problem by converting all of __cpu_up, _cpu_up and cpu_up from __devinit to __cpuinit. The approach is justified since the callers of cpu_up are either dependent on CONFIG_HOTPLUG_CPU or are of __init type. Thus when CONFIG_HOTPLUG_CPU=y, all these cpu up functions would land up in .text section, and when CONFIG_HOTPLUG_CPU=n, all these functions would land up in .init section. Tested on a i386 SMP machine running linux-2.6.20-rc3-mm1. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-11 18:18:20 -08:00
Nathan Lynch	e5e5673f82	[PATCH] sched: tasks cannot run on cpus onlined after boot Commit `5c1e176781` ("sched: force /sbin/init off isolated cpus") sets init's cpus_allowed to a subset of cpu_online_map at boot time, which means that tasks won't be scheduled on cpus that are added to the system later. Make init's cpus_allowed a subset of cpu_possible_map instead. This should still preserve the behavior that Nick's change intended. Thanks to Giuliano Pochini for reporting this and testing the fix: http://ozlabs.org/pipermail/linuxppc-dev/2006-December/029397.html Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-11 18:18:20 -08:00
Vivek Goyal	343cde51b3	[PATCH] x86-64: Make noirqdebug_setup function non init to fix modpost warning o noirqdebug_setup() is __init but it is being called by quirk_intel_irqbalance() which if of type __devinit. If CONFIG_HOTPLUG=y, quirk_intel_irqbalance() is put into text section and it is wrong to call a function in __init section. o MODPOST flags this on i386 if CONFIG_RELOCATABLE=y WARNING: vmlinux - Section mismatch: reference to .init.text:noirqdebug_setup from .text between 'quirk_intel_irqbalance' (at offset 0xc010969e) and 'i8237A_suspend' o Make noirqdebug_setup() non-init. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-01-11 01:52:44 +01:00
Linus Torvalds	d0abc451a6	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: [PATCH] Driver core: Fix prefix driver links in /sys/module by bus-name	2007-01-06 00:10:55 -08:00
Ingo Molnar	a75acf850c	[PATCH] profiling: fix sched profiling typo Fix sched profiling typo, introduced by the sleep profiling patch. This bug caused profile=sched to not work. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:22 -08:00
Rafael J. Wysocki	7bf2368742	[PATCH] swsusp: Do not fail if resume device is not set In the kernels later than 2.6.19 there is a regression that makes swsusp fail if the resume device is not explicitly specified. It can be fixed by adding an additional parameter to mm/swapfile.c:swap_type_of() allowing us to pass the (struct block_device *) corresponding to the first available swap back to the caller. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:22 -08:00
Ard van Breemen	a416aba637	[PATCH] kernelparams: detect if and which parameter parsing enabled irq's The parsing of some kernel parameters seem to enable irq's at a stage that irq's are not supposed to be enabled (Particularly the ide kernel parameters). Having irq's enabled before the irq controller is initialized might lead to a kernel panic. This patch only detects this behaviour and warns about wich parameter caused it. [akpm@osdl.org: cleanups] Signed-off-by: Ard van Breemen <ard@telegraafnet.nl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:21 -08:00
Kay Sievers	7f422e2e84	[PATCH] Driver core: Fix prefix driver links in /sys/module by bus-name Modules may have drivers with the same name on different buses. This patch fixes this problem. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-01-05 12:33:38 -08:00
Oleg Nesterov	241ceee0b4	[PATCH] restore ->pdeath_signal behaviour Commit `b2b2cbc4b2` introduced a user- visible change: ->pdeath_signal is sent only when the entire thread group exits. While this change is imho good, it may break things. So restore the old behaviour for now. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> To: Albert Cahalan <acahalan@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@osdl.org> Cc: Linus Torvalds <torvalds@osdl.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Qi Yong <qiyong@fc-cn.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-31 14:41:18 -08:00
Andrew Morton	755cd90029	[PATCH] lockdep: printk warning fix kernel/lockdep.c: In function `lookup_chain_cache': kernel/lockdep.c:1339: warning: long long unsigned int format, u64 arg (arg 2) kernel/lockdep.c:1344: warning: long long unsigned int format, u64 arg (arg 2) Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:56:43 -08:00
Andrew Morton	089e34b600	[PATCH] cpuset procfs warning fix fs/proc/base.c:1869: warning: initialization discards qualifiers from pointer target type fs/proc/base.c:2150: warning: initialization discards qualifiers from pointer target type Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:56:43 -08:00
Akinobu Mita	a3f99f8ba8	[PATCH] module: fix mod_sysfs_setup() return value mod_sysfs_setup() doesn't return error when kobject_add_dir() failed. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:56:41 -08:00
Ingo Molnar	9414232fa0	[PATCH] sched: fix cond_resched_softirq() offset Remove the __resched_legal() check: it is conceptually broken. The biggest problem it had is that it can mask buggy cond_resched() calls. A cond_resched() call is only legal if we are not in an atomic context, with two narrow exceptions: - if the system is booting - a reacquire_kernel_lock() down() done while PREEMPT_ACTIVE is set But __resched_legal() hid this and just silently returned whenever these primitives were called from invalid contexts. (Same goes for cond_resched_locked() and cond_resched_softirq()). Furthermore, the __legal_resched(0) call was buggy in that it caused unnecessarily long softirq latencies via cond_resched_softirq(). (which is only called from softirq-off sections, hence the code did nothing.) The fix is to resurrect the efficiency of the might_sleep checks and to only allow the narrow exceptions. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:56:41 -08:00
Ingo Molnar	e4e6bdbb42	[PATCH] rcu: rcutorture suspend fix Fix suspend hang: rcutorture threads need to be nofreeze. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:55:55 -08:00
Ingo Molnar	e1d9fd2e3d	[PATCH] suspend: fix suspend on single-CPU systems Clark Williams reported that suspend doesnt work on his laptop on 2.6.20-rc1-rt kernels. The bug was introduced by the following cleanup commit: commit `112cecb2cc` Author: Siddha, Suresh B <suresh.b.siddha@intel.com> Date: Wed Dec 6 20:34:31 2006 -0800 [PATCH] suspend: don't change cpus_allowed for task initiating the suspend because with this change 'error' is not initialized to 0 anymore, if there are no other online CPUs. (i.e. if the system is single-CPU). the fix is the initialize it to 0. The really weird thing is that my version of gcc does not warn about this non-initialized variable situation ... (also fix the kernel printk in the error branch, it was missing a newline) Reported-by: Clark Williams <williams@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-23 13:59:33 -08:00
Linus Torvalds	18ed1c0513	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (68 commits) ACPI: replace kmalloc+memset with kzalloc ACPI: Add support for acpi_load_table/acpi_unload_table_id fbdev: update after backlight argument change ACPI: video: Add dev argument for backlight_device_register ACPI: Implement acpi_video_get_next_level() ACPI: Kconfig - depend on PM rather than selecting it ACPI: fix NULL check in drivers/acpi/osl.c ACPI: make drivers/acpi/ec.c:ec_ecdt static ACPI: prevent processor module from loading on failures ACPI: fix single linked list manipulation ACPI: ibm_acpi: allow clean removal ACPI: fix git automerge failure ACPI: ibm_acpi: respond to workqueue update ACPI: dock: add uevent to indicate change in device status ACPI: ec: Lindent once again ACPI: ec: Change #define to enums there possible. ACPI: ec: Style changes. ACPI: ec: Acquire Global Lock under EC mutex. ACPI: ec: Drop udelay() from poll mode. Loop by reading status field instead. ACPI: ec: Rename gpe_bit to gpe ...	2006-12-22 18:46:56 -08:00
Eric W. Biederman	b2b2cbc4b2	[PATCH] Fix reparenting to the same thread group. (take 2) This patch fixes the case when we reparent to a different thread in the same thread group. This modifies the code so that we do not send signals and do not change the signal to send to SIGCHLD unless we have change the thread group of our parents. It also suppresses sending pdeath_sig in this cas as well since the result of geppid doesn't change. Thanks to Oleg for spotting my bug of only fixing this for non-ptraced tasks. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Albert Cahalan <acahalan@gmail.com> Cc: Andrew Morton <akpm@osdl.org> Cc: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Coywolf Qi Hunt <qiyong@fc-cn.com> Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 09:03:41 -08:00
Andrew Morton	192636ad90	[PATCH] relay: remove inlining text data bss dec hex filename before: 4036 44 0 4080 ff0 kernel/relay.o after: 3727 44 0 3771 ebb kernel/relay.o Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:51 -08:00
Vadim Lobanov	01b2d93ca4	[PATCH] fdtable: Provide free_fdtable() wrapper Christoph Hellwig has expressed concerns that the recent fdtable changes expose the details of the RCU methodology used to release no-longer-used fdtable structures to the rest of the kernel. The trivial patch below addresses these concerns by introducing the appropriate free_fdtable() calls, which simply wrap the release RCU usage. Since free_fdtable() is a one-liner, it makes sense to promote it to an inline helper. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:50 -08:00
Andrew Morton	5b149bcc23	[PATCH] schedule_timeout(): improve warning message Kyle is hitting this warning, and we don't have a clue what it's caused by. Add the obligatory dump_stack(). Cc: kyle <kylewong@southa.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:49 -08:00
Akinobu Mita	3e1fbd12c9	[PATCH] audit: fix kstrdup() error check kstrdup() returns NULL on error. Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:49 -08:00
Thomas Gleixner	9d7ac8be4b	[PATCH] genirq: fix irq flow handler uninstall The sanity check for no_irq_chip in __set_irq_hander() is unconditional on both install and uninstall of an handler. This triggers false warnings and replaces no_irq_chip by dummy_irq_chip in the uninstall case. Check only, when a real handler is installed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Sylvain Munaut <tnt@246tNt.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:48 -08:00
Tim Chen	67af63a6ab	[PATCH] sched: remove __cpuinitdata anotation to cpu_isolated_map The structure cpu_isolated_map is used not only during initialization. Multi-core scheduler configuration changes and exclusive cpusets use this during run time. During setting of sched_mc_power_savings policy, this structure is accessed to update sched_domains. Signed-off-by: Tim Chen <tim.c.chen@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:47 -08:00
Adrian Bunk	99eea6a105	[PATCH] make kernel/printk.c:ignore_loglevel_setup() static Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:47 -08:00

... 5 6 7 8 9 ...

2395 Commits (cc1ff43b7002b40686f03269b5011d59965737a4)