Commit Graph

4674 Commits (d210baf53b699fc61aa891c177b71d7082d3b957)

Author SHA1 Message Date
John Kacur 9d35935747 pm_qos_requirement might sleep
Make PM_QOS and CPU_IDLE play nicer when run with the RT-Preempt kernel.

The purpose of the patch is to remove the spin_lock around the read in the
function pm_qos_requirement - since spinlocks can sleep in -rt and this
function is called from idle.

CPU_IDLE polls the target_value's of some of the pm_qos parameters from
the idle loop causing sleeping locking warnings.  Changing the
target_value to an atomic avoids this issue.

Remove the spinlock in pm_qos_requirement by making target_value an atomic
type.

Signed-off-by: mark gross <mgross@linux.intel.com>
Signed-off-by: John Kacur <jkacur@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02 19:21:40 -07:00
Oleg Nesterov 950bbabb5a pid_ns: (BUG 11391) change ->child_reaper when init->group_leader exits
We don't change pid_ns->child_reaper when the main thread of the
subnamespace init exits.  As Robert Rex <robert.rex@exasol.com> pointed
out this is wrong.

Yes, the re-parenting itself works correctly, but if the reparented task
exits it needs ->parent->nsproxy->pid_ns in do_notify_parent(), and if the
main thread is zombie its ->nsproxy was already cleared by
exit_task_namespaces().

Introduce the new function, find_new_reaper(), which finds the new
->parent for the re-parenting and changes ->child_reaper if needed.  Kill
the now unneeded exit_child_reaper().

Also move the changing of ->child_reaper from zap_pid_ns_processes() to
find_new_reaper(), this consolidates the games with ->child_reaper and
makes it stable under tasklist_lock.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=11391

Reported-by: Robert Rex <robert.rex@exasol.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02 19:21:38 -07:00
Oleg Nesterov add0d4dfd6 pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing
zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.

Yes, we have already killed all tasks in this namespace, and sys_wait4()
doesn't see any child.  But this doesn't mean ->children list is empty, we
may have EXIT_DEAD tasks which are not visible to do_wait().  In that case
the subsequent forget_original_parent() will crash the kernel because it
will try to re-parent these tasks to the NULL reaper.

Even if there are no childs, it is not good that forget_original_parent()
uses reaper == NULL.

Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
We could use pid_ns->parent->child_reaper as well, I think this does not
really matter.  These EXIT_DEAD tasks are not visible to the new ->parent
after re-parenting, they will silently do release_task() eventually.

Note that we must change ->child_reaper, otherwise
forget_original_parent() will use reaper == father, and in that case we
will hit the (correct) BUG_ON(!list_empty(&father->children)).

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02 19:21:38 -07:00
Linus Torvalds 99039e1352 Merge branch 'audit.b57' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b57' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  [PATCH] audit: Moved variable declaration to beginning of function
2008-09-02 11:04:47 -07:00
Oleg Nesterov cbaed698f3 softlockup: minor cleanup, don't check task->state twice
The recent commit 16d9679f33caf7e683471647d1472bfe133d858 changed
check_hung_task() to filter out the TASK_KILLABLE tasks. We can
move this check to the caller which has to test t->state anyway.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02 10:49:51 -07:00
Randy Dunlap 6781f4ae30 kernel/resource.c: fix new kernel-doc warning
Fix kernel-doc warning for new function:

Warning(linux-2.6.27-rc5-git2//kernel/resource.c:448): No description found for parameter 'root'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02 10:47:30 -07:00
Cordelia c4bacefb7a [PATCH] audit: Moved variable declaration to beginning of function
got rid of compilation warning:
ISO C90 forbids mixed declarations and code

Signed-off-by: Cordelia Sam <cordesam@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-09-01 23:06:45 -04:00
Linus Torvalds bef69ea0dc Resource handling: add 'insert_resource_expand_to_fit()' function
Not used anywhere yet, but this complements the existing plain
'insert_resource()' functionality with a version that can expand the
resource we are adding in order to fix up any conflicts it has with
existing resources.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-29 20:25:20 -07:00
Andi Kleen 316d9679f3 Don't trigger softlockup detector on network fs blocked tasks
Pulling the ethernet cable on a 2.6.27-rc system with NFS mounts
currently leads to an ongoing flood of soft lockup detector backtraces
for all tasks blocked on the NFS mounts when the hickup takes
longer than 120s.

I don't think NFS problems should be all that noisy.

Luckily there's a reasonably easy way to distingush this case.

Don't report task softlockup warnings for tasks in TASK_KILLABLE
state, which is used by the network file systems.

I believe this patch is a 2.6.27 candidate.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-29 14:46:29 -07:00
Linus Torvalds 66833d5f39 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  exit signals: use of uninitialized field notify_count
  lockdep: fix invalid list_del_rcu in zap_class
  lockstat: repair erronous contention statistics
  lockstat: fix numerical output rounding error
2008-08-28 12:31:49 -07:00
Linus Torvalds 0234bf1d98 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: rt-bandwidth accounting fix
  sched: fix sched_rt_rq_enqueue() resched idle
2008-08-28 12:31:12 -07:00
Linus Torvalds e52c8857e0 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: update defconfigs
  x86: msr: fix bogus return values from rdmsr_safe/wrmsr_safe
  x86: cpuid: correct return value on partial operations
  x86: msr: correct return value on partial operations
  x86: cpuid: propagate error from smp_call_function_single()
  x86: msr: propagate errors from smp_call_function_single()
  smp: have smp_call_function_single() detect invalid CPUs
2008-08-28 12:30:59 -07:00
Rafael J. Wysocki 41108eb101 ftrace: disable tracing for hibernation
In accordance with commit f42ac38c59
("ftrace: disable tracing for suspend to ram"), disable tracing
around the suspend code in hibernation code paths.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-28 12:27:39 -07:00
Peter Zijlstra cc2991cf15 sched: rt-bandwidth accounting fix
It fixes an accounting bug where we would continue accumulating runtime
even though the bandwidth control is disabled. This would lead to very long
throttle periods once bandwidth control gets turned on again.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-28 13:42:38 +02:00
John Blackwood f3ade83780 sched: fix sched_rt_rq_enqueue() resched idle
When sysctl_sched_rt_runtime is set to something other than -1 and the
CONFIG_RT_GROUP_SCHED kernel parameter is NOT enabled, we get into a state
where we see one or more CPUs idling forvever even though there are
real-time
tasks in their rt runqueue that are able to run (no longer throttled).

The sequence is:

- A real-time task is running when the timer sets the rt runqueue
    to throttled, and the rt task is resched_task()ed and switched
    out, and idle is switched in since there are no non-rt tasks to
    run on that cpu.

- Eventually the do_sched_rt_period_timer() runs and un-throttles
    the rt runqueue, but we just exit the timer interrupt and go back
    to executing the idle task in the idle loop forever.

If we change the sched_rt_rq_enqueue() routine to use some of the code
from the CONFIG_RT_GROUP_SCHED enabled version of this same routine and
resched_task() the currently executing task (idle in our case) if it is
a lower priority task than the higher rt task in the now un-throttled
runqueue, the problem is no longer observed.

Signed-off-by: John Blackwood <john.blackwood@ccur.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-28 11:13:24 +02:00
Steven Rostedt f42ac38c59 ftrace: disable tracing for suspend to ram
I've been painstakingly debugging the issue with suspend to ram and
ftraced. The 2.6.28 code does not have this issue, but since the mcount
recording is not going to be in 27, this must be solved for the ftrace
daemon version.

The resume from suspend to ram would reboot because it was triple
faulting. Debugging further, I found that calling the mcount function
itself was not an issue, but it would fault when it incremented
preempt_count. preempt_count is on the tasks info structure that is on the
low memory address of the task's stack.  For some reason, it could not
write to it. Resuming out of suspend to ram does quite a lot of funny
tricks to get to work, so it is not surprising at all that simply doing a
preempt_disable() would cause a fault.

Thanks to Rafael for suggesting to add a "while (1);" to find the place in
resuming that is causing the fault. I would place the loop somewhere in
the code, compile and reboot and see if it would either reboot (hit the
fault) or simply hang (hit the loop).  Doing this over and over again, I
narrowed it down that it was happening in enable_nonboot_cpus.

At this point, I found that it is easier to simply disable tracing around
the suspend code, instead of searching for the particular function that
can not handle doing a preempt_disable.

This patch disables the tracer as it suspends and reenables it on resume.

I tested this patch on my Laptop, and it can resume fine with the patch.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-27 13:54:20 -07:00
Steve VanDeBogart 2633f0e57b exit signals: use of uninitialized field notify_count
task->signal->notify_count is only initialized if
task->signal->group_exit_task is not NULL.  Reorder a conditional so
that uninitialised memory is not used.  Found by Valgrind.

Signed-off-by: Steve VanDeBogart <vandebo-lkml@nerdbox.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-27 09:10:09 +02:00
Zhu Yi 7487017282 lockdep: fix invalid list_del_rcu in zap_class
The problem is found during iwlagn driver testing on
v2.6.27-rc4-176-gb8e6c91 kernel, but it turns out to be a lockdep bug.
In our testing, we frequently load and unload the iwlagn driver
(>50 times). Then the MAX_STACK_TRACE_ENTRIES is reached (expected
behaviour?). The error message with the call trace is as below.

BUG: MAX_STACK_TRACE_ENTRIES too low!
turning off the locking correctness validator.
Pid: 4895, comm: iwlagn Not tainted 2.6.27-rc4 #13

Call Trace:
 [<ffffffff81014aa1>] save_stack_trace+0x22/0x3e
 [<ffffffff8105390a>] save_trace+0x8b/0x91
 [<ffffffff81054e60>] mark_lock+0x1b0/0x8fa
 [<ffffffff81056f71>] __lock_acquire+0x5b9/0x716
 [<ffffffffa00d818a>] ieee80211_sta_work+0x0/0x6ea [mac80211]
 [<ffffffff81057120>] lock_acquire+0x52/0x6b
 [<ffffffff81045f0e>] run_workqueue+0x97/0x1ed
 [<ffffffff81045f5e>] run_workqueue+0xe7/0x1ed
 [<ffffffff81045f0e>] run_workqueue+0x97/0x1ed
 [<ffffffff81046ae4>] worker_thread+0xd8/0xe3
 [<ffffffff81049503>] autoremove_wake_function+0x0/0x2e
 [<ffffffff81046a0c>] worker_thread+0x0/0xe3
 [<ffffffff810493ec>] kthread+0x47/0x73
 [<ffffffff8128e3ab>] trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8100cea9>] child_rip+0xa/0x11
 [<ffffffff8100c4df>] restore_args+0x0/0x30
 [<ffffffff810316e1>] finish_task_switch+0x0/0xcc
 [<ffffffff810493a5>] kthread+0x0/0x73
 [<ffffffff8100ce9f>] child_rip+0x0/0x11

Although the above is harmless, when the ilwagn module is removed
later lockdep will trigger a kernel oops as below.

BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
IP: [<ffffffff810531e1>] zap_class+0x24/0x82
PGD 73128067 PUD 7448c067 PMD 0
Oops: 0002 [1] SMP
CPU 0
Modules linked in: rfcomm l2cap bluetooth autofs4 sunrpc
nf_conntrack_ipv6 xt_state nf_conntrack xt_tcpudp ip6t_ipv6header
ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand
acpi_cpufreq dm_mirror dm_log dm_multipath dm_mod snd_hda_intel sr_mod
snd_seq_dummy snd_seq_oss snd_seq_midi_event battery snd_seq
snd_seq_device cdrom button snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd_page_alloc e1000e snd_hwdep sg iTCO_wdt
iTCO_vendor_support ac pcspkr i2c_i801 i2c_core snd soundcore video
output ata_piix ata_generic libata sd_mod scsi_mod ext3 jbd mbcache
uhci_hcd ohci_hcd ehci_hcd [last unloaded: mac80211]
Pid: 4941, comm: modprobe Not tainted 2.6.27-rc4 #10
RIP: 0010:[<ffffffff810531e1>]  [<ffffffff810531e1>]
zap_class+0x24/0x82
RSP: 0000:ffff88007bcb3eb0  EFLAGS: 00010046
RAX: 0000000000068ee8 RBX: ffffffff8192a0a0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000001dfb RDI: ffffffff816e70b0
RBP: ffffffffa00cd000 R08: ffffffff816818f8 R09: ffff88007c923558
R10: ffffe20002ad2408 R11: ffffffff811028ec R12: ffffffff8192a0a0
R13: 000000000002bd90 R14: 0000000000000000 R15: 0000000000000296
FS:  00007f9d1cee56f0(0000) GS:ffffffff814a58c0(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000073047000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 4941, threadinfo ffff88007bcb2000, task
ffff8800758d1fc0)
Stack:  ffffffff81057376 0000000000000000 ffffffffa00f7b00
0000000000000000
 0000000000000080 0000000000618278 00007fff24f16720 0000000000000000
 ffffffff8105d37a ffffffffa00f7b00 ffffffff8105d591 313132303863616d
Call Trace:
 [<ffffffff81057376>] ? lockdep_free_key_range+0x61/0xf5
 [<ffffffff8105d37a>] ? free_module+0xd4/0xe4
 [<ffffffff8105d591>] ? sys_delete_module+0x1de/0x1f9
 [<ffffffff8106dbfa>] ? audit_syscall_entry+0x12d/0x160
 [<ffffffff8100be2b>] ? system_call_fastpath+0x16/0x1b

Code: b2 00 01 00 00 00 c3 31 f6 49 c7 c0 10 8a 61 81 eb 32 49 39 38
75 26 48 98 48 6b c0 38 48 8b 90 08 8a 61 81 48 8b 88 00 8a 61 81 <48>
89 51 08 48 89 0a 48 c7 80 08 8a 61 81 00 02 20 00 48 ff c6
RIP  [<ffffffff810531e1>] zap_class+0x24/0x82
 RSP <ffff88007bcb3eb0>
CR2: 0000000000000008
---[ end trace a1297e0c4abb0f2e ]---

The root cause for this oops is in add_lock_to_list() when
save_trace() fails due to MAX_STACK_TRACE_ENTRIES is reached,
entry->class is assigned but entry is never added into any lock list.
This makes the list_del_rcu() in zap_class() oops later when the
module is unloaded. This patch fixes the problem by assigning
entry->class after save_trace() returns success.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-27 08:40:36 +02:00
Joe Korty 04148b73b8 lockstat: repair erronous contention statistics
Fix bad contention counting in /proc/lock_stat.

/proc/lockstat tries to gather per-ip contention
statistics per-lock.  This was failing due to
a garbage per-ip index selector being used.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-26 10:37:47 +02:00
Joe Korty 2189459d25 lockstat: fix numerical output rounding error
Fix rounding error in /proc/lock_stat numerical output.

On occasion the two digit fractional part contains the three
digit value '100'.  This is due to a bug in the rounding algorithm
which pushes values in the range '95..99' to '100' rather than
to '00' + an increment to the integer part.  For example,

	- 123456.100      old display
	+ 123457.00	  new display

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-26 10:37:46 +02:00
H. Peter Anvin f73be6dedf smp: have smp_call_function_single() detect invalid CPUs
Have smp_call_function_single() return invalid CPU indicies and return
-ENXIO.  This function is already executed inside a
get_cpu()..put_cpu() which locks out CPU removal, so rather than
having the higher layers doing another layer of locking to guard
against unplugged CPUs do the test here.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-08-25 17:45:48 -07:00
Linus Torvalds cc556c5c92 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched_clock: fix cpu_clock()
2008-08-25 11:26:02 -07:00
Linus Torvalds ffb4ba76a2 [module] Don't let gcc inline load_module()
'load_module()' is a complex function that contains all the ELF section
logic, and inlining it is utterly insane.  But gcc will do it, simply
because there is only one call-site.  As a result, all the stack space
that is allocated for all the work to load the module will still be
active when we actually call the module init sequence, and the deep call
chain makes stack overflows happen.

And stack overflows are really hard to debug, because they not only
corrupt random pages below the stack, but also corrupt the thread_info
structure that is allocated under the stack.

In this case, Alan Brunelle reported some crazy oopses at bootup, after
loading the processor module that ends up doing complex ACPI stuff and
has quite a deep callchain.  This should fix it, and is the sane thing
to do regardless.

Cc: Alan D. Brunelle <Alan.Brunelle@hp.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-25 11:10:26 -07:00
Peter Zijlstra 354879bb97 sched_clock: fix cpu_clock()
This patch fixes 3 issues:

a) it removes the dependency on jiffies, because jiffies are incremented
   by a single CPU, and the tick is not synchronized between CPUs. Therefore
   relying on it to calculate a window to clip whacky TSC values doesn't work
   as it can drift around.

   So instead use [GTOD, GTOD+TICK_NSEC) as the window.

b) __update_sched_clock() did (roughly speaking):

   delta = sched_clock() - scd->tick_raw;
   clock += delta;

   Which gives exponential growth, instead of linear.

c) allows the sched_clock_cpu() value to warp the u64 without breaking.

the results are more reliable sched_clock() deltas:

           before       after   sched_clock

cpu_clock: 15750        51312   51488
cpu_clock: 59719        51052   50947
cpu_clock: 15879        51249   51061
cpu_clock: 1            50933   51198
cpu_clock: 1            50931   51039
cpu_clock: 1            51093   50981
cpu_clock: 1            51043   51040
cpu_clock: 1            50959   50938
cpu_clock: 1            50981   51011
cpu_clock: 1            51364   51212
cpu_clock: 1            51219   51273
cpu_clock: 1            51389   51048
cpu_clock: 1            51285   51611
cpu_clock: 1            50964   51137
cpu_clock: 1            50973   50968
cpu_clock: 1            50967   50972
cpu_clock: 1            58910   58485
cpu_clock: 1            51082   51025
cpu_clock: 1            50957   50958
cpu_clock: 1            50958   50957
cpu_clock: 1006128      51128   50971
cpu_clock: 1            51107   51155
cpu_clock: 1            51371   51081
cpu_clock: 1            51104   51365
cpu_clock: 1            51363   51309
cpu_clock: 1            51107   51160
cpu_clock: 1            51139   51100
cpu_clock: 1            51216   51136
cpu_clock: 1            51207   51215
cpu_clock: 1            51087   51263
cpu_clock: 1            51249   51177
cpu_clock: 1            51519   51412
cpu_clock: 1            51416   51255
cpu_clock: 1            51591   51594
cpu_clock: 1            50966   51374
cpu_clock: 1            50966   50966
cpu_clock: 1            51291   50948
cpu_clock: 1            50973   50867
cpu_clock: 1            50970   50970
cpu_clock: 998306       50970   50971
cpu_clock: 1            50971   50970
cpu_clock: 1            50970   50970
cpu_clock: 1            50971   50971
cpu_clock: 1            50970   50970
cpu_clock: 1            51351   50970
cpu_clock: 1            50970   51352
cpu_clock: 1            50971   50970
cpu_clock: 1            50970   50970
cpu_clock: 1            51321   50971
cpu_clock: 1            50974   51324

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-25 17:39:57 +02:00
Adrian Bunk 7a8fc9b248 removed unused #include <linux/version.h>'s
This patch lets the files using linux/version.h match the files that
#include it.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-23 12:14:12 -07:00
Linus Torvalds 43cc071db8 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: enable LB_BIAS by default
2008-08-22 08:36:55 -07:00
Linus Torvalds 05f57f50e0 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: fix synchronize_rcu() so that kernel-doc works
2008-08-22 08:36:42 -07:00
Miao Xie 3c4fbe5e01 nohz: fix wrong event handler after online an offlined cpu
On the tickless system(CONFIG_NO_HZ=y and CONFIG_HIGH_RES_TIMERS=n), after
I made an offlined cpu online, I found this cpu's event handler was
tick_handle_periodic, not tick_nohz_handler.

After debuging, I found this bug was caused by the wrong tick mode.  the
tick mode is not changed to NOHZ_MODE_INACTIVE when the cpu is offline.

This patch fixes this bug.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-21 09:54:06 +02:00
Randy Dunlap 01dcb0443e rcu: fix synchronize_rcu() so that kernel-doc works
Fix RCU's synchronize_rcu() so that it looks like a C function, enabling
it to be recognized as a function with kernel-doc annotation.

Warning(linux-2.6.26-git11//kernel/rcupdate.c:81): No description found for parameter 'synchronize_rcu'
Warning(linux-2.6.26-git11//kernel/rcupdate.c:81): No description found for parameter 'call_rcu'

[akpm@linux-foundation.org: fix comment]
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-21 09:31:44 +02:00
Peter Zijlstra efc2dead2c sched: enable LB_BIAS by default
Yanmin reported a significant regression on his 16-core machine due to:

  commit 93b75217df
  Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
  Date:   Fri Jun 27 13:41:33 2008 +0200

Flip back to the old behaviour.

Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-21 08:18:02 +02:00
Ken Chen 2d70b68d42 fix setpriority(PRIO_PGRP) thread iterator breakage
When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP
process, only the task leader of the process is affected, all other
sibling LWP threads didn't receive the setting.  The problem was that the
iterator used in sys_setpriority() only iteartes over one task for each
process, ignoring all other sibling thread.

Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk
each thread of a process.  Convert 4 call sites in {set/get}priority and
ioprio_{set/get}.

Signed-off-by: Ken Chen <kenchen@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-20 15:40:32 -07:00
Roland McGrath 1b04624f93 tracehook: fix SA_NOCLDWAIT
I outwitted myself again in commit 2b2a1ff64a,
and broke the SA_NOCLDWAIT behavior so it leaks zombies.  This fixes it.

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Roland McGrath <roland@redhat.com>
2008-08-19 20:37:07 -07:00
Dmitry Baryshkov 6951b12a0f lockdep: fix spurious 'inconsistent lock state' warning
Since f82b217e35 lockdep can output spurious
warnings related to hwirqs due to hardirq_off shrinkage from int to bit-sized
flag. Guard it with double negation to fix the warning.

Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-18 09:42:31 +02:00
Linus Torvalds 406703f8de Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  lockdep: fix build if CONFIG_PROVE_LOCKING not defined
  lockdep: use WARN() in kernel/lockdep.c
  lockdep: spin_lock_nest_lock(), checkpatch fixes
  lockdep: build fix
2008-08-16 17:16:07 -07:00
Linus Torvalds c100548d46 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: scale sysctl_sched_shares_ratelimit with nr_cpus
  sched: fix rt-bandwidth hotplug race
  sched: fix the race between walk_tg_tree and sched_create_group
2008-08-16 17:15:32 -07:00
Linus Torvalds 71ef2a46fc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  security: Fix setting of PF_SUPERPRIV by __capable()
2008-08-15 15:32:13 -07:00
Stephen Hemminger df60a84418 lockdep: fix build if CONFIG_PROVE_LOCKING not defined
If CONFIG_PROVE_LOCKING not defined, then no dependency information
is available.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15 19:22:04 +02:00
Peter Zijlstra 55cd53404c sched: scale sysctl_sched_shares_ratelimit with nr_cpus
David reported that his Niagra spend a little too much time in
tg_shares_up(), which considering he has a large cpu count makes sense.

So scale the ratelimit value with the number of cpus like we do for
other controls as well.

Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15 18:25:07 +02:00
Dave Chinner be4de35263 completions: uninline try_wait_for_completion and completion_done
m68k fails to build with these functions inlined in completion.h.  Move
them out of line into sched.c and export them to avoid this problem.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15 08:35:44 -07:00
Andrew Morton 8c5a1cf0ad kexec: use a mutex for locking rather than xchg()
Functionally the same, but more conventional.

Cc: Huang Ying <ying.huang@intel.com>
Tested-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15 08:35:43 -07:00
Huang Ying 3122c33119 kexec jump: fix for ftrace
Ftrace depends on some processor state that we destroyed during kexec and
restored by restore_processor_state().  So save_processor_state() and
restore_processor_state() are moved into machine_kexec() and ftrace is
restored after restore_processor_state().

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15 08:35:43 -07:00
Huang Ying 73bd9c72a2 kexec jump: in sync with hibernation implementation
Add device_pm_lock() and device_pm_unlock() in kernel_kexec() in sync with
current hibernation implementation.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15 08:35:42 -07:00
Huang Ying ca195b7f6d kexec jump: remove duplication of kexec_restart_prepare()
Call kernel_restart_prepare() in kernel_kexec() instead of duplicating the
code.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Pavel Machek <pavel@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15 08:35:42 -07:00
Huang Ying 163f6876f5 kexec jump: rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control
page is used for not only code on some platform.  For example in kexec
jump, it is used for data and stack too.

[akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15 08:35:42 -07:00
Huang Ying 7ade3fcc1f kexec jump: clean up #ifdef and comments
Move if (kexec_image->preserve_context) { ...  } into #ifdef
CONFIG_KEXEC_JUMP to make code looks cleaner.

Fix no longer correct comments of kernel_kexec().

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15 08:35:42 -07:00
Huang Ying 4cd69b986e kexec: fix compilation warning on xchg(&kexec_lock, 0) in kernel_kexec()
kernel/kexec.c: In function 'kernel_kexec':
kernel/kexec.c:1506: warning: value computed is not used

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15 08:35:42 -07:00
Peter Zijlstra f1679d0848 sched: fix rt-bandwidth hotplug race
When we hot-unplug a cpu and rebuild the sched-domain, all cpus will be
detatched. Alex observed the case where a runqueue was stealing bandwidth
from an already disabled runqueue to satisfy its own needs.

Stop this by skipping over already disabled runqueues.

Reported-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-14 15:50:58 +02:00
David Howells 5cd9c58fbe security: Fix setting of PF_SUPERPRIV by __capable()
Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags
the target process if that is not the current process and it is trying to
change its own flags in a different way at the same time.

__capable() is using neither atomic ops nor locking to protect t->flags.  This
patch removes __capable() and introduces has_capability() that doesn't set
PF_SUPERPRIV on the process being queried.

This patch further splits security_ptrace() in two:

 (1) security_ptrace_may_access().  This passes judgement on whether one
     process may access another only (PTRACE_MODE_ATTACH for ptrace() and
     PTRACE_MODE_READ for /proc), and takes a pointer to the child process.
     current is the parent.

 (2) security_ptrace_traceme().  This passes judgement on PTRACE_TRACEME only,
     and takes only a pointer to the parent process.  current is the child.

     In Smack and commoncap, this uses has_capability() to determine whether
     the parent will be permitted to use PTRACE_ATTACH if normal checks fail.
     This does not set PF_SUPERPRIV.

Two of the instances of __capable() actually only act on current, and so have
been changed to calls to capable().

Of the places that were using __capable():

 (1) The OOM killer calls __capable() thrice when weighing the killability of a
     process.  All of these now use has_capability().

 (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see
     whether the parent was allowed to trace any process.  As mentioned above,
     these have been split.  For PTRACE_ATTACH and /proc, capable() is now
     used, and for PTRACE_TRACEME, has_capability() is used.

 (3) cap_safe_nice() only ever saw current, so now uses capable().

 (4) smack_setprocattr() rejected accesses to tasks other than current just
     after calling __capable(), so the order of these two tests have been
     switched and capable() is used instead.

 (5) In smack_file_send_sigiotask(), we need to allow privileged processes to
     receive SIGIO on files they're manipulating.

 (6) In smack_task_wait(), we let a process wait for a privileged process,
     whether or not the process doing the waiting is privileged.

I've tested this with the LTP SELinux and syscalls testscripts.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Morris <jmorris@namei.org>
2008-08-14 22:59:43 +10:00
Zhang, Yanmin 09f2724a78 sched: fix the race between walk_tg_tree and sched_create_group
With 2.6.27-rc3, I hit a kernel panic when running volanoMark on my
new x86_64 machine. I also hit it with other 2.6.27-rc kernels.
See below log.

Basically, function walk_tg_tree and sched_create_group have a race
between accessing and initiating tg->children. Below patch fixes it
by moving tg->children initiation to the front of linking tg->siblings
to parent->children.

{----------------panic log------------}

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: [<ffffffff802292ab>] walk_tg_tree+0x45/0x7f
PGD 1be1c4067 PUD 1bdd8d067 PMD 0
Oops: 0000 [1] SMP
CPU 11
Modules linked in: igb
Pid: 22979, comm: java Not tainted 2.6.27-rc3 #1
RIP: 0010:[<ffffffff802292ab>]  [<ffffffff802292ab>] walk_tg_tree+0x45/0x7f
RSP: 0018:ffff8801bfbbbd18  EFLAGS: 00010083
RAX: 0000000000000000 RBX: ffff8800be0dce40 RCX: ffffffffffffffc0
RDX: ffff880102c43740 RSI: 0000000000000000 RDI: ffff8800be0dce40
RBP: ffff8801bfbbbd48 R08: ffff8800ba437bc8 R09: 0000000000001f40
R10: ffff8801be812100 R11: ffffffff805fdf44 R12: ffff880102c43740
R13: 0000000000000000 R14: ffffffff8022cf0f R15: ffffffff8022749f
FS:  00000000568ac950(0063) GS:ffff8801bfa26d00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000001bd848000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process java (pid: 22979, threadinfo ffff8801b145a000, task ffff8801bf18e450)
Stack:  0000000000000001 ffff8800ba5c8d60 0000000000000001 0000000000000001
 ffff8800bad1ccb8 0000000000000000 ffff8801bfbbbd98 ffffffff8022ed37
 0000000000000001 0000000000000286 ffff8801bd5ee180 ffff8800ba437bc8
Call Trace:
 <IRQ>  [<ffffffff8022ed37>] try_to_wake_up+0x71/0x24c
 [<ffffffff80247177>] autoremove_wake_function+0x9/0x2e
 [<ffffffff80228039>] ? __wake_up_common+0x46/0x76
 [<ffffffff802296d5>] __wake_up+0x38/0x4f
 [<ffffffff806169cc>] tcp_v4_rcv+0x380/0x62e

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-14 10:58:48 +02:00
Arjan van de Ven 2df8b1d656 lockdep: use WARN() in kernel/lockdep.c
Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-08-13 19:06:46 +02:00