linux

Commit Graph

Author	SHA1	Message	Date
Steven Rostedt	4c739ff043	tracing: show proper address for trace-printk format Since the trace_printk may use pointers to the format fields in the buffer, they are exported via debugfs/tracing/printk_formats. This is used by utilities that read the ring buffer in binary format. It helps the utilities map the address of the format in the binary buffer to what the printf format looks like. Unfortunately, the way the output code works, it exports the address of the pointer to the format address, and not the format address itself. This makes the file totally useless in trying to figure out what format string a binary address belongs to. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-07-23 10:07:17 -04:00
Li Zefan	636eacee3b	tracing/stat: Fix seqfile memory leak Every time we cat a trace_stat file, we leak memory allocated by seq_open(). Also fix memory leak in a failure path in tracing_stat_open(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A67D92B.4060704@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-07-23 09:53:55 -04:00
Li Zefan	87827111a5	function-graph: Fix seqfile memory leak Every time we cat set_graph_function, we leak memory allocated by seq_open(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A67D907.2010500@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-07-23 09:53:23 -04:00
Li Zefan	d8cc1ab793	trace_stack: Fix seqfile memory leak Every time we cat stack_trace, we leak memory allocated by seq_open(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A67D8E8.3020500@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-07-23 09:52:09 -04:00
Bruno Premont	61f3826133	genirq: Fix UP compile failure caused by irq_thread_check_affinity Since genirq: Delegate irq affinity setting to the irq thread (`591d2fb02e`) compilation with CONFIG_SMP=n fails with following error: /usr/src/linux-2.6/kernel/irq/manage.c: In function 'irq_thread_check_affinity': /usr/src/linux-2.6/kernel/irq/manage.c:475: error: 'struct irq_desc' has no member named 'affinity' make[4]: *** [kernel/irq/manage.o] Error 1 That commit adds a new function irq_thread_check_affinity() which uses struct irq_desc.affinity which is only available for CONFIG_SMP=y. Move that function under #ifdef CONFIG_SMP. [ tglx@brownpaperbag: compile and boot tested on UP and SMP ] Signed-off-by: Bruno Premont <bonbons@linux-vserver.org> LKML-Reference: <20090722222232.2eb3e1c4@neptune.home> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-07-22 23:18:46 +02:00
Linus Torvalds	3c3301083e	Merge branch 'perf-counters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf * 'perf-counters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf: (31 commits) perf_counter tools: Give perf top inherit option perf_counter tools: Fix vmlinux symbol generation breakage perf_counter: Detect debugfs location perf_counter: Add tracepoint support to perf list, perf stat perf symbol: C++ demangling perf: avoid structure size confusion by using a fixed size perf_counter: Fix throttle/unthrottle event logging perf_counter: Improve perf stat and perf record option parsing perf_counter: PERF_SAMPLE_ID and inherited counters perf_counter: Plug more stack leaks perf: Fix stack data leak perf_counter: Remove unused variables perf_counter: Make call graph option consistent perf_counter: Add perf record option to log addresses perf_counter: Log vfork as a fork event perf_counter: Synthesize VDSO mmap event perf_counter: Make sure we dont leak kernel memory to userspace perf_counter tools: Fix index boundary check perf_counter: Fix the tracepoint channel to perfcounters perf_counter, x86: Extend perf_counter Pentium M support ...	2009-07-22 11:41:56 -07:00
Linus Torvalds	612e900c28	Merge branch 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softirq: introduce tasklet_hrtimer infrastructure	2009-07-22 10:12:18 -07:00
Linus Torvalds	c57c374378	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource: Prevent NULL pointer dereference timer: Avoid reading uninitialized data	2009-07-22 10:11:47 -07:00
Linus Torvalds	5b26776bd9	Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Delegate irq affinity setting to the irq thread	2009-07-22 10:11:24 -07:00
Linus Torvalds	356d1b52eb	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: fix nr_uninterruptible accounting of frozen tasks really sched: fix load average accounting vs. cpu hotplug sched: Account for vruntime wrapping	2009-07-22 10:10:36 -07:00
Arjan van de Ven	0dc3d523e8	perf: fix stack data leak the "reserved" field was not initialized to zero, resulting in 4 bytes of stack data leaking to userspace.... Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-22 09:29:52 -07:00
Anton Blanchard	966ee4d6b8	perf_counter: Fix throttle/unthrottle event logging Right now we only print PERF_EVENT_THROTTLE + 1 (ie PERF_EVENT_UNTHROTTLE). Fix this to print both a throttle and unthrottle event. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090722130546.GE9029@kryten>	2009-07-22 18:05:56 +02:00
Peter Zijlstra	7f453c24b9	perf_counter: PERF_SAMPLE_ID and inherited counters Anton noted that for inherited counters the counter-id as provided by PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID because each inherited counter gets its own id. His suggestion was to always return the parent counter id, since that is the primary counter id as exposed. However, these inherited counters have a unique identifier so that events like PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which counter gets modified, which is important when trying to normalize the sample streams. This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD, which is more useful anyway, since changing periods became a lot more common than initially thought -- rendering PERF_EVENT_PERIOD the less useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate value, since it reports the value used to trigger the overflow, whereas PERF_EVENT_PERIOD simply reports the requested period changed, which might only take effect on the next cycle). This still leaves us PERF_EVENT_THROTTLE to consider, but since that _should_ be a rare occurrence, and linking it to a primary id is the most useful bit to diagnose the problem, we introduce a PERF_SAMPLE_STREAM_ID, for those few cases where the full reconstruction is important. [Does change the ABI a little, but I see no other way out] Suggested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1248095846.15751.8781.camel@twins>	2009-07-22 18:05:56 +02:00
Peter Zijlstra	573402db02	perf_counter: Plug more stack leaks Per example of Arjan's patch, I went through and found a few more. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2009-07-22 18:05:55 +02:00
Arjan van de Ven	c9f73a3dd2	perf: Fix stack data leak the "reserved" field was not initialized to zero, resulting in 4 bytes of stack data leaking to userspace.... Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2009-07-22 18:05:55 +02:00
Peter Zijlstra	1d2f37945d	Merge commit 'tip/perfcounters/core' into perf-counters-for-linus	2009-07-22 18:05:48 +02:00
Peter Zijlstra	9ba5f005c9	softirq: introduce tasklet_hrtimer infrastructure commit `ca109491f` (hrtimer: removing all ur callback modes) moved all hrtimer callbacks into hard interrupt context when high resolution timers are active. That breaks code which relied on the assumption that the callback happens in softirq context. Provide a generic infrastructure which combines tasklets and hrtimers together to provide an in-softirq hrtimer experience. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: torvalds@linux-foundation.org Cc: kaber@trash.net Cc: David Miller <davem@davemloft.net> LKML-Reference: <1248265724.27058.1366.camel@twins> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-07-22 17:01:17 +02:00
Thomas Gleixner	591d2fb02e	genirq: Delegate irq affinity setting to the irq thread irq_set_thread_affinity() calls set_cpus_allowed_ptr() which might sleep, but irq_set_thread_affinity() is called with desc->lock held and can be called from hard interrupt context as well. The code has another bug as it does not hold a ref on the task struct as required by set_cpus_allowed_ptr(). Just set the IRQTF_AFFINITY bit in action->thread_flags. The next time the thread runs it migrates itself. Solves all of the above problems nicely. Add kerneldoc to irq_set_thread_affinity() while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission>	2009-07-21 14:35:07 +02:00
Li Zefan	1f9963cbb0	tracing/filters: improve subsystem filter Currently a subsystem filter should be applicable to all events under the subsystem, and if it failed, all the event filters will be cleared. Those behaviors make subsys filter much less useful: # echo 'vec == 1' > irq/softirq_entry/filter # echo 'irq == 5' > irq/filter bash: echo: write error: Invalid argument # cat irq/softirq_entry/filter none I'd expect it set the filter for irq_handler_entry/exit, and not touch softirq_entry/exit. The basic idea is, try to see if the filter can be applied to which events, and then just apply to the those events: # echo 'vec == 1' > softirq_entry/filter # echo 'irq == 5' > filter # cat irq_handler_entry/filter irq == 5 # cat softirq_entry/filter vec == 1 Changelog for v2: - do some cleanups to address Frederic's comments. Inspired-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A63D485.7030703@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-07-20 13:29:19 -04:00
Xiao Guangrong	ff4e9da233	tracing: cleanup for tracing_trace_options_read() '\n' is already appended, and what we need is just an extra space for the '\0'. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> LKML-Reference: <4A3EED63.3090908@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-07-20 12:02:09 -04:00
Li Zefan	7d536cb3fb	tracing/events: record the size of dynamic arrays When a dynamic array is defined, we add __data_loc_foo in trace_entry to record the offset of the array, but the size of the array is not recorded, which causes 2 problems: - the event filter just compares the first 2 chars of the strings. - parsers can't parse dynamic arrays. So we encode the size of each dynamic array in the higher 16 bits of __data_loc_foo, while the offset is in lower 16 bits. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A5E964A.9000403@cn.fujitsu.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-07-20 11:38:44 -04:00
Thomas Gleixner	79ef2bb014	clocksource: Prevent NULL pointer dereference Writing a zero length string to sys/.../current_clocksource will cause a NULL pointer dereference if the clock events system is in one shot (highres or nohz) mode. Pointed-out-by: Dan Carpenter <error27@gmail.com> LKML-Reference: <alpine.DEB.2.00.0907191545580.12306@bicker> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-07-19 17:15:54 +02:00
Pavel Roskin	4841158b26	timer: Avoid reading uninitialized data timer->expires may be uninitialized, so check timer_pending() before touching timer->expires to pacify kmemcheck. Signed-off-by: Pavel Roskin <proski@gnu.org> LKML-Reference: <20090718204602.5191.360.stgit@mj.roinet.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-07-18 23:11:43 +02:00
Thomas Gleixner	6301cb95c1	sched: fix nr_uninterruptible accounting of frozen tasks really commit `e3c8ca8336` (sched: do not count frozen tasks toward load) broke the nr_uninterruptible accounting on freeze/thaw. On freeze the task is excluded from accounting with a check for (task->flags & PF_FROZEN), but that flag is cleared before the task is thawed. So while we prevent that the task with state TASK_UNINTERRUPTIBLE is accounted to nr_uninterruptible on freeze we decrement nr_uninterruptible on thaw. Use a separate flag which is handled by the freezing task itself. Set it before calling the scheduler with TASK_UNINTERRUPTIBLE state and clear it after we return from frozen state. Cc: <stable@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-07-18 14:19:53 +02:00
Thomas Gleixner	a468d38934	sched: fix load average accounting vs. cpu hotplug The new load average code clears rq->calc_load_active on CPU_ONLINE. That's wrong as the new onlined CPU might have got a scheduler tick already and accounted the delta to the stale value of the time we offlined the CPU. Clear the value when we cleanup the dead CPU instead. Also move the update of the calc_load_update time for the newly online CPU to CPU_UP_PREPARE to avoid that the CPU plays catch up with the stale update time value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-07-18 14:19:52 +02:00
Mel Gorman	e5d490b252	profile: Suppress warning about large allocations when profile=1 is specified When profile= is used, a large buffer is allocated early at boot. This can be larger than what the page allocator can provide so it prints a warning. However, the caller is able to handle the situation so this patch suppresses the warning. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Linux Memory Management List <linux-mm@kvack.org> Cc: Heinz Diehl <htd@fancy-poultry.org> Cc: David Miller <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1247656992-19846-3-git-send-email-mel@csn.ul.ie> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-18 12:55:28 +02:00
jolsa@redhat.com	566b0aaf79	tracing: Remove unused fields/variables Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: rostedt@goodmis.org LKML-Reference: <1247773468-11594-2-git-send-email-jolsa@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-18 12:21:16 +02:00
Ingo Molnar	45bceffc30	Merge branch 'linus' into tracing/core Merge reason: tracing/core was on an older, pre-rc1 base. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-18 12:20:01 +02:00
Anton Blanchard	ed900c054b	perf_counter: Log vfork as a fork event Right now we don't output vfork events. Even though we should always see an exec after a vfork, we may get perfcounter samples between the vfork and exec. These samples can lead to some confusion when parsing perfcounter data. To keep things consistent we should always log a fork event. It will result in a little more log data, but is less confusing to trace parsing tools. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090716104817.589309391@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-18 11:21:31 +02:00
Anton Blanchard	413ee3b48a	perf_counter: Make sure we dont leak kernel memory to userspace There are a few places we are leaking tiny amounts of kernel memory to userspace. This happens when writing out strings because we always align the end to 64 bits. To avoid this we should always use an appropriately sized temporary buffer and ensure it is zeroed. Since d_path assembles the string from the end of the buffer backwards, we need to add 64 bits after the buffer to allow for alignment. We also need to copy arch_vma_name to the temporary buffer, because if we use it directly we may end up copying to userspace a number of bytes after the end of the string constant. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090716104817.273972048@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-18 11:21:29 +02:00
Fabio Checconi	54fdc58166	sched: Account for vruntime wrapping I spotted two sites that didn't take vruntime wrap-around into account. Fix these by creating a comparison helper that does do so. Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-18 11:17:08 +02:00
Xiao Guangrong	6f2f3cf00e	tracing/function: Cleanup for function tracer We can directly use %pf input format instead of kallsyms_lookup() and %s input format Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-07-17 01:45:51 -04:00
Xiao Guangrong	79173bf556	tracing/trace_stack: Cleanup for trace_lookup_stack() We can directly use %pF input format instead of sprint_symbol() and %s input format. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-07-17 01:44:10 -04:00
Xiao Guangrong	64fbcd1628	tracing/function: Simplify __ftrace_replace_code() Rewrite the __ftrace_replace_code() function, simplify it, but don't change the code's logic. First, we get the state we want to set, if the record has the same state, then do nothing, otherwise enable/disable it. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-07-17 00:37:53 -04:00
Xiao Guangrong	04aef32d39	tracing/function: Fix the return value of ftrace_trace_onoff_callback() ftrace_trace_onoff_callback() will return an error even if we do the right operation, for example: # echo _spin_:traceon:10 > set_ftrace_filter -bash: echo: write error: Invalid argument # cat set_ftrace_filter #### all functions enabled #### _spin_trylock_bh:traceon:count=10 _spin_unlock_irq:traceon:count=10 _spin_unlock_bh:traceon:count=10 _spin_lock_irq:traceon:count=10 _spin_unlock:traceon:count=10 _spin_trylock:traceon:count=10 _spin_unlock_irqrestore:traceon:count=10 _spin_lock_irqsave:traceon:count=10 _spin_lock_bh:traceon:count=10 _spin_lock:traceon:count=10 We want to set _spin_:traceon:10 to set_ftrace_filter, it complains with "Invalid argument", but the operation is successful. This is because ftrace_process_regex() returns the number of functions that matched the pattern. If the number is not 0, this value is returned by ftrace_regex_write() whereas we want to return the number of bytes virtually written. Also the file offset pointer is not updated in this case. If the number of matched functions is lower than the number of bytes written by the user, this results to a reprocessing of the string given by the user with a lower size, leading to a malformed ftrace regex and then a -EINVAL returned. So, this patch fixes it by returning 0 if no error occured. The fix also applies on 2.6.30 Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Cc: stable@kernel.org Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-07-16 23:34:32 -04:00
Lai Jiangshan	da706d8bc8	ring_buffer: Fix warning while ignoring cmpxchg return value kernel/trace/ring_buffer.c: In function 'rb_tail_page_update': kernel/trace/ring_buffer.c:849: warning: value computed is not used kernel/trace/ring_buffer.c:850: warning: value computed is not used Add "(void)"s to fix this warning, because we don't need here to handle the fail case of cmpxchg, it's fine if an interrupt already did the job. Changed from V1: Add a comment(which is written by Steven) for it. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-07-16 18:46:47 -04:00
Linus Torvalds	4b0a84043e	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-sched * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-sched: sched: Fix bug in SCHED_IDLE interaction with group scheduling sched: Fix rt_rq->pushable_tasks initialization in init_rt_rq() sched: Reset sched stats on fork() sched_rt: Fix overload bug on rt group scheduling sched: Documentation/sched-rt-group: Fix style issues & bump version	2009-07-16 10:18:29 -07:00
Linus Torvalds	62f49052ac	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimer: Fix migration expiry check hrtimer: migration: do not check expiry time on current CPU	2009-07-14 18:35:24 -07:00
Linus Torvalds	989fa94096	Merge branch 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futexes: Fix infinite loop in get_futex_key() on huge page	2009-07-14 18:35:00 -07:00
Steven Rostedt	6ab5d668b1	tracing/function-profiler: do not free per cpu variable stat The per cpu variable stat is freeded if we fail to allocate a name on start up. This was due to stat at first being allocated in the initial design. But since then, it has become a static per cpu variable but the free on error was not removed. Also added __init annotation to the function that this is in. [ Impact: prevent possible memory corruption on low mem at boot up ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-13 11:01:10 +02:00
Chris Wilson	d4d7d0b954	perf_counter: Fix the tracepoint channel to perfcounters Fix a missed rename in EVENT_PROFILE support so that it gets built and allows tracepoint tracing from the 'perf' tool. Fix a typo in the (never before built & enabled) portion in perf_counter.c as well, and update that code to the attr.config changes as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Gamari <bgamari.foss@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1246869094-21237-1-git-send-email-chris@chris-wilson.co.uk> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-13 09:23:10 +02:00
Linus Torvalds	7638d5322b	Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6 * 'kmemleak' of git://linux-arm.org/linux-2.6: kmemleak: Remove alloc_bootmem annotations introduced in the past kmemleak: Add callbacks to the bootmem allocator kmemleak: Allow partial freeing of memory blocks kmemleak: Trace the kmalloc_large* functions in slub kmemleak: Scan objects allocated during a scanning episode kmemleak: Do not acquire scan_mutex in kmemleak_open() kmemleak: Remove the reported leaks number limitation kmemleak: Add more cond_resched() calls in the scanning thread kmemleak: Renice the scanning thread to +10	2009-07-12 12:24:35 -07:00
Alexey Dobriyan	405f55712d	headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-12 12:22:34 -07:00
Sonny Rao	ce2ae53b75	futexes: Fix infinite loop in get_futex_key() on huge page get_futex_key() can infinitely loop if it is called on a virtual address that is within a huge page but not aligned to the beginning of that page. The call to get_user_pages_fast will return the struct page for a sub-page within the huge page and the check for page->mapping will always fail. The fix is to call compound_head on the page before checking that it's mapped. Signed-off-by: Sonny Rao <sonnyrao@us.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org Cc: anton@samba.org Cc: rajamony@us.ibm.com Cc: speight@us.ibm.com Cc: mstephen@us.ibm.com Cc: grimm@us.ibm.com Cc: mikey@ozlabs.au.ibm.com LKML-Reference: <20090710231313.GA23572@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-11 12:40:44 +02:00
Paul Turner	d07387b490	sched: Fix bug in SCHED_IDLE interaction with group scheduling One of the isolation modifications for SCHED_IDLE is the unitization of sleeper credit. However the check for this assumes that the sched_entity we're placing always belongs to a task. This is potentially not true with group scheduling and leaves us rummaging randomly when we try to pull the policy. Signed-off-by: Paul Turner <pjt@google.com> Cc: peterz@infradead.org LKML-Reference: <alpine.DEB.1.00.0907101649570.29914@kitami.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-11 10:00:09 +02:00
Linus Torvalds	ac3f482236	Merge branch 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: dma-debug: Fix the overlap() function to be correct and readable oprofile: reset bt_lost_no_mapping with other stats x86/oprofile: rename kernel parameter for architectural perfmon to arch_perfmon signals: declare sys_rt_tgsigqueueinfo in syscalls.h rcu: Mark Hierarchical RCU no longer experimental dma-debug: Put all hash-chain locks into the same lock class dma-debug: fix off-by-one error in overlap function	2009-07-10 14:25:59 -07:00
Peter Zijlstra	d86ee4809d	sched: optimize cond_resched() Optimize cond_resched() by removing one conditional. Currently cond_resched() checks system_state == SYSTEM_RUNNING in order to avoid scheduling before the scheduler is running. We can however, as per suggestion of Matt, use PREEMPT_ACTIVE to accomplish that very same. Suggested-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-10 14:24:05 -07:00
Linus Torvalds	2a6f86bc5e	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Fix trace_print_seq() kprobes: No need to unlock kprobe_insn_mutex tracing/fastboot: Document the need of initcall_debug trace_export: Repair missed fields tracing: Fix stack tracer sysctl handling	2009-07-10 11:41:41 -07:00
Thomas Gleixner	6ff7041dbf	hrtimer: Fix migration expiry check The timer migration expiry check should prevent the migration of a timer to another CPU when the timer expires before the next event is scheduled on the other CPU. Migrating the timer might delay it because we can not reprogram the clock event device on the other CPU. But the code implementing that check has two flaws: - for !HIGHRES the check compares the expiry value with the clock events device expiry value which is wrong for CLOCK_REALTIME based timers. - the check is racy. It holds the hrtimer base lock of the target CPU, but the clock event device expiry value can be modified nevertheless, e.g. by an timer interrupt firing. The !HIGHRES case is easy to fix as we can enqueue the timer on the cpu which was selected by the load balancer. It runs the idle balancing code once per jiffy anyway. So the maximum delay for the timer is the same as when we keep the tick on the current cpu going. In the HIGHRES case we can get the next expiry value from the hrtimer cpu_base of the target CPU and serialize the update with the cpu_base lock. This moves the lock section in hrtimer_interrupt() so we can set next_event to KTIME_MAX while we are handling the expired timers and set it to the next expiry value after we handled the timers under the base lock. While the expired timers are processed timer migration is blocked because the expiry time of the timer is always <= KTIME_MAX. Also remove the now useless clockevents_get_next_event() function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-07-10 17:32:55 +02:00
Thomas Gleixner	7e0c5086c1	hrtimer: migration: do not check expiry time on current CPU The timer migration code needs to check whether the expiry time of the timer is before the programmed clock event expiry time when the timer is enqueued on another CPU because we can not reprogram the timer device on the other CPU. The current logic checks the expiry time even if we enqueue on the current CPU when nohz_get_load_balancer() returns current CPU. This might lead to an endless loop in the expiry check code when the expiry time of the timer is before the current programmed next event. Check whether nohz_get_load_balancer() returns current CPU and skip the expiry check if this is the case. The bug was triggered from the networking code. The patch fixes the regression http://bugzilla.kernel.org/show_bug.cgi?id=13738 (Soft-Lockup/Race in networking in 2.6.31-rc1+195) Cc: Arun Bharadwaj <arun@linux.vnet.ibm.com Tested-by: Joao Correia <joaomiguelcorreia@gmail.com> Tested-by: Andres Freund <andres@anarazel.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-07-10 17:22:20 +02:00
Ingo Molnar	e202687927	Merge branch 'tip/tracing/ring-buffer-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/core	2009-07-10 13:30:06 +02:00
Lai Jiangshan	a35780005e	tracing/workqueues: Add refcnt to struct cpu_workqueue_stats The stat entries can be freed when the stat file is being read. The worse is, the ptr can be freed immediately after it's returned from workqueue_stat_start/next(). Add a refcnt to struct cpu_workqueue_stats to avoid use-after-free. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4A51B16F.6010608@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 12:14:07 +02:00
Lai Jiangshan	d8ea37d5de	tracing/stat: Add stat_release() callback Add stat_release() callback to struct tracer_stat, so a stat tracer can release it's entries after the stat file has been read out. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A51B16A.6020708@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 12:14:05 +02:00
Li Zefan	80098c200e	kmemtrace: Rename some functions So we have: - kmemtrace_print_alloc/free() for kmemtrace default output - kmemtrace_print_alloc/free_user() for binary output used by kmemtrace-user. Suggested-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A51B288.70505@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 12:09:04 +02:00
Frederic Weisbecker	6a167c6558	tracing/kmemtrace: Use the %pf format Remove the obsolete seq_print_ip_sym() usage and replace it by the %pf format in order to print function symbols. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> LKML-Reference: <1247107590-6428-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 12:07:20 +02:00
Frederic Weisbecker	68baafcfc4	tracing/function-graph-tracer: Use the %pf format Remove the obsolete seq_print_ip_sym() usage and replace it by the %pf format in order to print function symbols. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <1247107590-6428-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 12:07:19 +02:00
Xiao Guangrong	dc82ec98a4	tracing/filter: Remove empty subsystem and its directory Remove empty subsystem and its directory when module unload. Before patch: # rmmod trace-events-sample.ko # ls sample enable filter After patch: # rmmod trace-events-sample.ko # ls sample ls: cannot access sample: No such file or directory Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Tom Zanussi <tzanussi@gmail.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4A55A8BE.9010707@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 11:55:28 +02:00
Xiao Guangrong	c5cb183608	tracing/filter: Remove preds from struct event_subsystem No need to save preds to event_subsystem, because it's not used. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Tom Zanussi <tzanussi@gmail.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4A55A83C.1030005@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 11:55:27 +02:00
Fabio Checconi	c20b08e398	sched: Fix rt_rq->pushable_tasks initialization in init_rt_rq() init_rt_rq() initializes only rq->rt.pushable_tasks, and not the pushable_tasks field of the passed rt_rq. The plist is not used uninitialized since the only pushable_tasks plists used are the ones of root rt_rqs; anyway reinitializing the list on every group creation corrupts the root plist, losing its previous contents. Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090615185638.GK21741@gandalf.sssup.it> CC: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 10:43:30 +02:00
Lucas De Marchi	7793527b90	sched: Reset sched stats on fork() The sched_stat fields are currently not reset upon fork. Ingo's recent commit `6c594c21fc` did reset nr_migrations, but it didn't reset any of the others. This patch resets all sched_stat fields on fork. Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <193b0f820907090457s7a3662f4gcdecdc22fcae857b@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 10:43:29 +02:00
Peter Zijlstra	a1ba4d8ba9	sched_rt: Fix overload bug on rt group scheduling Fixes an easily triggerable BUG() when setting process affinities. Make sure to count the number of migratable tasks in the same place: the root rt_rq. Otherwise the number doesn't make sense and we'll hit the BUG in set_cpus_allowed_rt(). Also, make sure we only count tasks, not groups (this is probably already taken care of by the fact that rt_se->nr_cpus_allowed will be 0 for groups, but be more explicit) Tested-by: Thomas Gleixner <tglx@linutronix.de> CC: stable@kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Gregory Haskins <ghaskins@novell.com> LKML-Reference: <1247067476.9777.57.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 10:43:29 +02:00
Peter Zijlstra	71a851b4d2	perf_counter: Stop open coding unclone_ctx Instead of open coding the unclone context thingy, put it in a common function. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 10:28:40 +02:00
Catalin Marinas	264ef8a904	kmemleak: Remove alloc_bootmem annotations introduced in the past kmemleak_alloc() calls were added in some places where alloc_bootmem was called. Since now kmemleak tracks bootmem allocations, these explicit calls should be run. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-07-09 17:07:02 +01:00
Joe Perches	ad361c9884	Remove multiple KERN_ prefixes from printk formats Commit `5fd29d6ccb` ("printk: clean up handling of log-levels and newlines") changed printk semantics. printk lines with multiple KERN_<level> prefixes are no longer emitted as before the patch. <level> is now included in the output on each additional use. Remove all uses of multiple KERN_<level>s in formats. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-08 10:30:03 -07:00
Alexey Dobriyan	b43f3cbd21	headers: mnt_namespace.h redux Fix various silly problems wrt mnt_namespace.h: - exit_mnt_ns() isn't used, remove it - done that, sched.h and nsproxy.h inclusions aren't needed - mount.h inclusion was need for vfsmount_lock, but no longer - remove mnt_namespace.h inclusion from files which don't use anything from mnt_namespace.h Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-08 09:31:56 -07:00
Steven Rostedt	77ae365eca	ring-buffer: make lockless This patch converts the ring buffers into a completely lockless buffer recording system. The read side still takes locks since we still serialize readers. But the writers are the ones that must be lockless (those can happen in NMIs). The main change is to the "head_page" pointer. We write to the tail, and read from the head. The "head_page" pointer in the cpu buffer is now just a reference to where to look. The real head page is now kept in the head_page->list->prev->next pointer. That is, in the list head of the previous page we set flags. The list pages are allocated to be aligned such that the lowest significant bits are always zero pointing to the list. This gives us play to put in flags to their pointers. bit 0: set when the page is a head page bit 1: set when the writer is moving the page (for overwrite mode) cmpxchg is used to update the pointer. When the writer wraps the buffer and the tail meets the head, in overwrite mode, the writer must move the head page forward. It first uses cmpxchg to change the pointer flag from 1 to 2. Once this is done, the reader on another CPU will not take the page from the buffer. The writers need to protect against interrupts (we don't bother with disabling interrupts because NMIs are allowed to write too). After the writer sets the pointer flag to 2, it takes care to manage interrupts coming in. This is discribed in detail within the comments of the code. Changes in version 2: - Let reader reset entries value of header page. - Fix tail page passing commit page on reader page test. - Always increment entries and write counter in rb_tail_page_update - Add safety check in rb_set_commit_to_write to break out of infinite loop - add mask in rb_is_reader_page [ Impact: lock free writing to the ring buffer ] Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-07-07 18:36:12 -04:00
Steven Rostedt	3adc54fa82	ring-buffer: make the buffer a true circular link list This patch changes the ring buffer data pages from using a link list head pointer, to making each buffer page point to another buffer page and never back to a "head". This makes the handling of the ring buffer less complex, since the traversing of the ring buffer pages no longer needs to account for the head pointer. This change also is needed to make the ring buffer lockless. [ Changes in version 2: - Added change that Lai Jiangshan mentioned. From: Lai Jiangshan <laijs@cn.fujitsu.com> Date: Thu, 11 Jun 2009 11:25:48 +0800 LKML-Reference: <4A30793C.6090208@cn.fujitsu.com> I'm not sure whether these 4 lines: bpage = list_entry(pages.next, struct buffer_page, list); list_del_init(&bpage->list); cpu_buffer->pages = &bpage->list; list_splice(&pages, cpu_buffer->pages); equal to these 2 lines: cpu_buffer->pages = pages.next; list_del(&pages); If there are equivalent, I think the second one are simpler. It may be not a really necessarily cleanup. What I asked is: if there are equivalent, could you use these two line: cpu_buffer->pages = pages.next; list_del(&pages); ] [ Impact: simplify the ring buffer to help make it lockless ] Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-07-07 18:36:10 -04:00
Oleg Nesterov	793285fcaf	cred_guard_mutex: do not return -EINTR to user-space do_execve() and ptrace_attach() return -EINTR if mutex_lock_interruptible(->cred_guard_mutex) fails. This is not right, change the code to return ERESTARTNOINTR. Perhaps we should also change proc_pid_attr_write(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Howells <dhowells@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-06 13:57:04 -07:00
Kevin Cernekee	5bfd756097	Fix virt_to_phys() warnings These warnings were observed on MIPS32 using 2.6.31-rc1 and gcc-4.2.0: mm/page_alloc.c: In function 'alloc_pages_exact': mm/page_alloc.c:1986: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast drivers/usb/mon/mon_bin.c: In function 'mon_alloc_buff': drivers/usb/mon/mon_bin.c:1264: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [akpm@linux-foundation.org: fix kernel/perf_counter.c too] Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-06 13:57:03 -07:00
Li Zefan	ddc1637af2	kmemtrace: Print binary output only if 'bin' option is set Currently by default the output of kmemtrace is binary format instead of human-readable output. This patch makes the following changes: - We'll see human-readable output by default - We'll see binary output if 'bin' option is set Note: you may probably need to explicitly disable context-info binary output: # echo 0 > options/context-info # echo 1 > options/bin # cat trace_pipe v2: - use %pF to print call_site Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A4DD0A0.5060500@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-03 11:39:46 +02:00
Xiao Guangrong	e1af3aec3e	tracing: Fix trace_print_seq() We will lose something if trace_seq->buffer[0] is 0, because the copy length is calculated by strlen() in seq_puts(), so using seq_write() instead of seq_puts(). There have a example: after reboot: # echo kmemtrace > current_tracer # echo 0 > options/kmem_minimalistic # cat trace # tracer: kmemtrace # # Nothing is exported, because the first byte of trace_seq->buffer[ ] is KMEMTRACE_USER_ALLOC. ( the value of KMEMTRACE_USER_ALLOC is zero, seeing kmemtrace_print_alloc_user() in kernel/trace/kmemtrace.c) Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4A4B2351.5010300@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-02 08:51:13 +02:00
Li Zefan	020e5f85cb	tracing/events: Add trace_event boot option We already have ftrace= boot option, and this adds a similar boot option for trace events, so allow trace events to be enabled at boot, for boot debugging purpose. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A4ACE29.3010407@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-01 15:44:24 +02:00
Masami Hiramatsu	c5cb5a2d8d	kprobes: Clean up insn_pages by using list instead of hlist Use struct list instead of struct hlist for managing insn_pages, because insn_pages doesn't use hash table. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> LKML-Reference: <20090630210814.17851.64651.stgit@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-01 10:43:41 +02:00
Masami Hiramatsu	4a2bb6fcc8	kprobes: No need to unlock kprobe_insn_mutex Remove needless kprobe_insn_mutex unlocking during safety check in garbage collection, because if someone releases a dirty slot during safety check (which ensures other cpus doesn't execute all dirty slots), the safety check must be fail. So, we need to hold the mutex while checking safety. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> LKML-Reference: <20090630210809.17851.28781.stgit@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-01 10:43:07 +02:00
Linus Torvalds	e83c2b0ff3	Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6 * 'kmemleak' of git://linux-arm.org/linux-2.6: kmemleak: Inform kmemleak about pid_hash kmemleak: Do not warn if an unknown object is freed kmemleak: Do not report new leaked objects if the scanning was stopped kmemleak: Slightly change the policy on newly allocated objects kmemleak: Do not trigger a scan when reading the debug/kmemleak file kmemleak: Simplify the reports logged by the scanning thread kmemleak: Enable task stacks scanning by default kmemleak: Allow the early log buffer to be configurable.	2009-06-30 19:04:53 -07:00
Linus Torvalds	55bcab4695	Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits) perf report: Add --symbols parameter perf report: Add --comms parameter perf report: Add --dsos parameter perf_counter tools: Adjust only prelinked symbol's addresses perf_counter: Provide a way to enable counters on exec perf_counter tools: Reduce perf stat measurement overhead/skew perf stat: Use percentages for scaling output perf_counter, x86: Update x86_pmu after WARN() perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run perf stat: Improve output perf stat: Fix multi-run stats perf stat: Add -n/--null option to run without counters perf_counter tools: Remove dead code perf_counter: Complete counter swap perf report: Print sorted callchains per histogram entries perf_counter tools: Prepare a small callchain framework perf record: Fix unhandled io return value perf_counter tools: Add alias for 'l1d' and 'l1i' perf-report: Add bare minimum PERF_EVENT_READ parsing perf-report: Add modes for inherited stats and no-samples ...	2009-06-30 19:02:59 -07:00
Renaud Lottiaux	df279ca896	bsdacct: fix access to invalid filp in acct_on() The file opened in acct_on and freshly stored in the ns->bacct struct can be closed in acct_file_reopen by a concurrent call after we release acct_lock and before we call mntput(file->f_path.mnt). Record file->f_path.mnt in a local variable and use this variable only. Signed-off-by: Renaud Lottiaux <renaud.lottiaux@kerlabs.com> Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-30 18:56:00 -07:00
Zhang Rui	8bc1ad7dd3	kernel/resource.c: fix sign extension in reserve_setup() When the 32-bit signed quantities get assigned to the u64 resource_size_t, they are incorrectly sign-extended. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13253 Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9905 Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reported-by: Leann Ogasawara <leann@ubuntu.com> Cc: Pierre Ossman <drzeus@drzeus.cx> Reported-by: <pablomme@googlemail.com> Tested-by: <pablomme@googlemail.com> Cc: <stable@kernel.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-30 18:56:00 -07:00
Paul Mackerras	57e7986ed1	perf_counter: Provide a way to enable counters on exec This provides a way to mark a counter to be enabled on the next exec. This is useful for measuring the total activity of a program without including overhead from the process that launches it. This also changes the perf stat command to use this new facility. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <19017.43927.838745.689203@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-30 12:00:16 +02:00
Catalin Marinas	12de38b186	kmemleak: Inform kmemleak about pid_hash Kmemleak does not track alloc_bootmem calls but the pid_hash allocated in pidhash_init() would need to be scanned as it contains pointers to struct pid objects. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2009-06-29 17:14:14 +01:00
Li Zefan	238a24f626	tracing/fastboot: Document the need of initcall_debug To use boot tracer, one should pass initcall_debug as well as ftrace=initcall to the command line. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4A48735E.9050002@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-29 10:22:10 +02:00
Linus Torvalds	8326e284f8	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, delay: tsc based udelay should have rdtsc_barrier x86, setup: correct include file in <asm/boot.h> x86, setup: Fix typo "CONFIG_x86_64" in <asm/boot.h> x86, mce: percpu mcheck_timer should be pinned x86: Add sysctl to allow panic on IOCK NMI error x86: Fix uv bau sending buffer initialization x86, mce: Fix mce resume on 32bit x86: Move init_gbpages() to setup_arch() x86: ensure percpu lpage doesn't consume too much vmalloc space x86: implement percpu_alloc kernel parameter x86: fix pageattr handling for lpage percpu allocator and re-enable it x86: reorganize cpa_process_alias() x86: prepare setup_pcpu_lpage() for pageattr fix x86: rename remap percpu first chunk allocator to lpage x86: fix duplicate free in setup_pcpu_remap() failure path percpu: fix too lazy vunmap cache flushing x86: Set cpu_llc_id on AMD CPUs	2009-06-28 11:05:28 -07:00
Linus Torvalds	187dd317f0	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timer stats: Optimize by adding quick check to avoid function calls timers: Fix timer_migration interface which accepts any number as input	2009-06-28 11:05:16 -07:00
Linus Torvalds	9b71272b6a	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ftrace: Fix the output of profile ring-buffer: Make it generally available ftrace: Remove duplicate newline tracing: Fix trace_buf_size boot option ftrace: Fix t_hash_start() ftrace: Don't manipulate @pos in t_start() ftrace: Don't increment @pos in g_start() tracing: Reset iterator in t_start() trace_stat: Don't increment @pos in seq start() tracing_bprintk: Don't increment @pos in t_start() tracing/events: Don't increment @pos in s_start()	2009-06-28 11:05:04 -07:00
Lai Jiangshan	82d5308127	trace_export: Repair missed fields Some fields for struct ftrace_graph_ret are missed when they are exported to user. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4A448FB6.5000302@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-26 20:48:40 +02:00
Li Zefan	a32c7765e2	tracing: Fix stack tracer sysctl handling This made my machine completely frozen: # echo 1 > /proc/sys/kernel/stack_tracer_enabled # echo 2 > /proc/sys/kernel/stack_tracer_enabled The cause is register_ftrace_function() was called twice. Also fix ftrace_enabled sysctl, though seems nothing bad happened as I tested it. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A448D17.9010305@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-26 20:48:39 +02:00
Peter Zijlstra	19d2e75543	perf_counter: Complete counter swap Complete the counter swap by indeed switching the times too and updating the userpage after modifying the counter values. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1246014623.31755.195.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-26 17:48:54 +02:00
Li Zefan	0296e4254f	ftrace: Fix the output of profile The first entry of the ftrace profile was always skipped when reading trace_stat/functionX. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A443D59.4080307@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-26 09:25:42 +02:00
Kurt Garloff	5211a242d0	x86: Add sysctl to allow panic on IOCK NMI error This patch introduces a new sysctl: /proc/sys/kernel/panic_on_io_nmi which defaults to 0 (off). When enabled, the kernel panics when the kernel receives an NMI caused by an IO error. The IO error triggered NMI indicates a serious system condition, which could result in IO data corruption. Rather than contiuing, panicing and dumping might be a better choice, so one can figure out what's causing the IO error. This could be especially important to companies running IO intensive applications where corruption must be avoided, e.g. a bank's databases. [ SuSE has been shipping it for a while, it was done at the request of a large database vendor, for their users. ] Signed-off-by: Kurt Garloff <garloff@suse.de> Signed-off-by: Roberto Angelino <robertangelino@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> LKML-Reference: <20090624213211.GA11291@kroah.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-25 22:06:11 +02:00
Peter Zijlstra	e6e18ec79b	perf_counter: Rework the sample ABI The PERF_EVENT_READ implementation made me realize we don't actually need the sample_type int the output sample, since we already have that in the perf_counter_attr information. Therefore, remove the PERF_EVENT_MISC_OVERFLOW bit and the event->type overloading, and imply put counter overflow samples in a PERF_EVENT_SAMPLE type. This also fixes the issue that event->type was only 32-bit and sample_type had 64 usable bits. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-25 21:39:08 +02:00
Peter Zijlstra	bfbd3381e6	perf_counter: Implement more accurate per task statistics With the introduction of PERF_EVENT_READ we have the possibility to provide accurate counter values for individual tasks in a task hierarchy. However, due to the lazy context switching used for similar counter contexts our current per task counts are way off. In order to maintain some of the lazy switch benefits we don't disable it out-right, but simply iterate the active counters and flip the values between the contexts. This only reads the counters but does not need to reprogram the full PMU. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-25 21:39:07 +02:00
Peter Zijlstra	38b200d676	perf_counter: Add PERF_EVENT_READ Provide a read() like event which can be used to log the counter value at specific sites such as child->parent folding on exit. In order to be useful, we log the counter parent ID, not the actual counter ID, since userspace can only relate parent IDs to perf_counter_attr constructs. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-25 21:39:07 +02:00
Peter Zijlstra	194002b274	perf_counter, x86: Add mmap counter read support Update the mmap control page with the needed information to use the userspace RDPMC instruction for self monitoring. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-25 21:39:06 +02:00
Peter Zijlstra	7f8b4e4e09	perf_counter: Add scale information to the mmap control page Add the needed time scale to the self-profile mmap information. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-25 21:39:05 +02:00
Thomas Gleixner	aa715284b4	futex: request only one page from get_user_pages() Yanmin noticed that fault_in_user_writeable() requests 4 pages instead of one. That's the result of blindly trusting Linus' proposal :) I even looked up the prototype to verify the correctness: the argument in question is confusingly enough named "len" while in reality it means number of pages. Pointed-out-by: Yanmin Zhang <yanmin_zhang@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-06-25 14:33:46 +02:00
Paul Mundt	1155de47cd	ring-buffer: Make it generally available In hunting down the cause for the hwlat_detector ring buffer spew in my failed -next builds it became obvious that folks are now treating ring_buffer as something that is generic independent of tracing and thus, suitable for public driver consumption. Given that there are only a few minor areas in ring_buffer that have any reliance on CONFIG_TRACING or CONFIG_FUNCTION_TRACER, provide stubs for those and make it generally available. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Cc: Jon Masters <jcm@jonmasters.org> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20090625053012.GB19944@linux-sh.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-25 10:31:30 +02:00
Li Zefan	00e54d087a	ftrace: Remove duplicate newline Before: # echo 'sys_open:traceon:' > set_ftrace_filter # echo 'sys_close:traceoff:5' > set_ftrace_filter # cat set_ftrace_filter #### all functions enabled #### sys_open:traceon:unlimited sys_close:traceoff:count=0 After: # cat set_ftrace_filter #### all functions enabled #### sys_open:traceon:unlimited sys_close:traceoff:count=0 Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A4313A7.7030105@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-25 10:28:36 +02:00
Linus Torvalds	c622304825	Merge branches 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/{vfs-2.6,audit-current} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: another race fix in jfs_check_acl() Get "no acls for this inode" right, fix shmem breakage inline functions left without protection of ifdef (acl) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: audit: inode watches depend on CONFIG_AUDIT not CONFIG_AUDIT_SYSCALL	2009-06-24 14:17:14 -07:00
Eric Paris	3a6a6c16be	audit: inode watches depend on CONFIG_AUDIT not CONFIG_AUDIT_SYSCALL Even though one cannot make use of the audit watch code without CONFIG_AUDIT_SYSCALL the spaghetti nature of the audit code means that the audit rule filtering requires that it at least be compiled. Thus build the audit_watch code when we build auditfilter like it was before `cfcad62c74` Clearly this is a point of potential future cleanup.. Reported-by: Frans Pop <elendil@planet.nl> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 16:42:05 -04:00
Thomas Gleixner	d0725992c8	futex: Fix the write access fault problem for real commit `64d1304a64` (futex: setup writeable mapping for futex ops which modify user space data) did address only half of the problem of write access faults. The patch was made on two wrong assumptions: 1) access_ok(VERIFY_WRITE,...) would actually check write access. On x86 it does _NOT_. It's a pure address range check. 2) a RW mapped region can not go away under us. That's wrong as well. Nobody can prevent another thread to call mprotect(PROT_READ) on that region where the futex resides. If that call hits between the get_user_pages_fast() verification and the actual write access in the atomic region we are toast again. The solution is to not rely on access_ok and get_user() for any write access related fault on private and shared futexes. Instead we need to fault it in with verification of write access. There is no generic non destructive write mechanism which would fault the user page in trough a #PF, but as we already know that we will fault we can as well call get_user_pages() directly and avoid the #PF overhead. If get_user_pages() returns -EFAULT we know that we can not fix it anymore and need to bail out to user space. Remove a bunch of confusing comments on this issue as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org	2009-06-24 21:27:35 +02:00
Paul E. McKenney	f6faac71d5	rcu: Mark Hierarchical RCU no longer experimental Removes the warnings about Hierarchical RCU being experimental, given that it has gone through almost six months of being the default RCU in mainline for the x86 with very little trouble. This makes hierarchical-RCU bootup look less scary. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: akpm@linux-foundation.org Cc: niv@us.ibm.com Cc: dvhltc@us.ibm.com Cc: dipankar@in.ibm.com Cc: dhowells@redhat.com Cc: lethal@linux-sh.org Cc: kernel@wantstofly.org Cc: cl@linux-foundation.org Cc: schamp@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-24 15:02:48 +02:00
Li Zefan	9d612beff5	tracing: Fix trace_buf_size boot option We should be able to specify [KMG] when setting trace_buf_size boot option, as documented in kernel-parameters.txt Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A41F2DB.4020102@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-24 11:41:12 +02:00
Heiko Carstens	507e123151	timer stats: Optimize by adding quick check to avoid function calls When the kernel is configured with CONFIG_TIMER_STATS but timer stats are runtime disabled we still get calls to __timer_stats_timer_set_start_info which initializes some fields in the corresponding struct timer_list. So add some quick checks in the the timer stats setup functions to avoid function calls to __timer_stats_timer_set_start_info when timer stats are disabled. In an artificial workload that does nothing but playing ping pong with a single tcp packet via loopback this decreases cpu consumption by 1 - 1.5%. This is part of a modified function trace output on SLES11: perl-2497 [00] 28630647177732388 [+ 125]: sk_reset_timer <-tcp_v4_rcv perl-2497 [00] 28630647177732513 [+ 125]: mod_timer <-sk_reset_timer perl-2497 [00] 28630647177732638 [+ 125]: __timer_stats_timer_set_start_info <-mod_timer perl-2497 [00] 28630647177732763 [+ 125]: __mod_timer <-mod_timer perl-2497 [00] 28630647177732888 [+ 125]: __timer_stats_timer_set_start_info <-__mod_timer perl-2497 [00] 28630647177733013 [+ 93]: lock_timer_base <-__mod_timer Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mustafa Mesanovic <mustafa.mesanovic@de.ibm.com> Cc: Arjan van de Ven <arjan@infradead.org> LKML-Reference: <20090623153811.GA4641@osiris.boeblingen.de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-24 11:15:09 +02:00
Li Zefan	d82d62444f	ftrace: Fix t_hash_start() When the output of set_ftrace_filter is larger than PAGE_SIZE, t_hash_start() will be called the 2nd time, and then we start from the head of a hlist, which is wrong and causes some entries to be outputed twice. The worse is, if the hlist is large enough, reading set_ftrace_filter won't stop but in a dead loop. Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A41876E.2060407@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-24 11:02:53 +02:00
Li Zefan	694ce0a544	ftrace: Don't manipulate @pos in t_start() It's rather confusing that in t_start(), in some cases @pos is incremented, and in some cases it's decremented and then incremented. This patch rewrites t_start() in a much more general way. Thus we fix a bug that if ftrace_filtered == 1, functions have tracer hooks won't be printed, because the branch is always unreachable: static void *t_start(...) { ... if (!p) return t_hash_start(m, pos); return p; } Before: # echo 'sys_open' > /mnt/tracing/set_ftrace_filter # echo 'sys_write:traceon:4' >> /mnt/tracing/set_ftrace_filter sys_open After: # echo 'sys_open' > /mnt/tracing/set_ftrace_filter # echo 'sys_write:traceon:4' >> /mnt/tracing/set_ftrace_filter sys_open sys_write:traceon:count=4 Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A41874B.4090507@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-24 11:02:53 +02:00
Li Zefan	85951842a1	ftrace: Don't increment @pos in g_start() It's wrong to increment @pos in g_start(). It causes some entries lost when reading set_graph_function, if the output of the file is larger than PAGE_SIZE. Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A418738.7090401@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-24 11:02:52 +02:00
Li Zefan	f129e965be	tracing: Reset iterator in t_start() The iterator is m->private, but it's not reset to trace_types in t_start(). If the output is larger than PAGE_SIZE and t_start() is called the 2nd time, things will go wrong. Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A418728.5020506@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-24 11:02:51 +02:00
Li Zefan	2961bf345f	trace_stat: Don't increment @pos in seq start() It's wrong to increment @pos in stat_seq_start(). It causes some stat entries lost when reading stat file, if the output of the file is larger than PAGE_SIZE. Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A418716.90209@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-24 11:02:51 +02:00
Li Zefan	c8961ec6da	tracing_bprintk: Don't increment @pos in t_start() It's wrong to increment @pos in t_start(), otherwise we'll lose some entries when reading printk_formats, if the output is larger than PAGE_SIZE. Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A4186FA.1020106@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-24 11:02:50 +02:00
Li Zefan	e1c7e2a6e6	tracing/events: Don't increment @pos in s_start() While testing syscall tracepoints posted by Jason, I found 3 entries were missing when reading available_events. The output size of available_events is < 4 pages, which means we lost 1 entry per page. The cause is, it's wrong to increment @pos in s_start(). Actually there's another bug here -- reading avaiable_events/set_events can race with module unload: # cat available_events \| s_start() \| s_stop() \| \| # rmmod foo.ko s_start() \| call = list_entry(m->private) \| @call might be freed and accessing it will lead to crash. Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A4186DD.6090405@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-24 11:02:49 +02:00
Al Viro	916d75761c	Fix rule eviction order for AUDIT_DIR If syscall removes the root of subtree being watched, we definitely do not want the rules refering that subtree to be destroyed without the syscall in question having a chance to match them. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 00:02:38 -04:00
Eric Paris	9d96098510	Audit: clean up all op= output to include string quoting A number of places in the audit system we send an op= followed by a string that includes spaces. Somehow this works but it's just wrong. This patch moves all of those that I could find to be quoted. Example: Change From: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1 subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op=remove rule key="number2" list=4 res=0 Change To: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1 subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op="remove rule" key="number2" list=4 res=0 Signed-off-by: Eric Paris <eparis@redhat.com>	2009-06-24 00:00:52 -04:00
Eric Paris	35fe4d0b1b	Audit: move audit_get_nd completely into audit_watch audit_get_nd() is only used by audit_watch and could be more cleanly implemented by having the audit watch functions call it when needed rather than making the generic audit rule parsing code deal with those objects. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-06-23 23:51:05 -04:00
Eric Paris	cfcad62c74	audit: seperate audit inode watches into a subfile In preparation for converting audit to use fsnotify instead of inotify we seperate the inode watching code into it's own file. This is similar to how the audit tree watching code is already seperated into audit_tree.c Signed-off-by: Eric Paris <eparis@redhat.com>	2009-06-23 23:50:59 -04:00
Eric Paris	ea7ae60bfe	Audit: clean up audit_receive_skb audit_receive_skb is hard to clearly parse what it is doing to the netlink message. Clean the function up so it is easy and clear to see what is going on. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-06-23 23:50:40 -04:00
Eric Paris	ee080e6ce9	Audit: cleanup netlink mesg handling The audit handling of netlink messages is all over the place. Clean things up, use predetermined macros, generally make it more readable. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-06-23 23:50:39 -04:00
Eric Paris	038cbcf65f	Audit: unify the printk of an skb when auditd not around Remove code duplication of skb printk when auditd is not around in userspace to deal with this message. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-06-23 23:50:37 -04:00
Eric Paris	e85188f424	Audit: dereferencing krule as if it were an audit_watch audit_update_watch() runs all of the rules for a given watch and duplicates them, attaches a new watch to them, and then when it finishes that process and has called free on all of the old rules (ok maybe still inside the rcu grace period) it proceeds to use the last element from list_for_each_entry_safe() as if it were a krule rather than being the audit_watch which was anchoring the list to output a message about audit rules changing. This patch unfies the audit message from two different places into a helper function and calls it from the correct location in audit_update_rules(). We will now get an audit message about the config changing for each rule (with each rules filterkey) rather than the previous garbage. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-06-23 23:50:36 -04:00
Eric Paris	b87ce6e418	Audit: better estimation of execve record length The audit execve record splitting code estimates the length of the message generated. But it forgot to include the "" that wrap each string in its estimation. This means that execve messages with lots of tiny (1-2 byte) arguments could still cause records greater than 8k to be emitted. Simply fix the estimate. Signed-off-by: Eric Paris <eparis@redhat.com>	2009-06-23 23:50:34 -04:00
Eric Paris	35aa901c0b	Audit: fix audit watch use after free When an audit watch is added to a parent the temporary watch inside the original krule from userspace is freed. Yet the original watch is used after the real watch was created in audit_add_rules() Signed-off-by: Eric Paris <eparis@redhat.com>	2009-06-23 23:50:33 -04:00
Peter Zijlstra	f344011ccb	perf_counter: Optimize perf_counter_alloc()'s inherit case We don't need to add usage counts for swcounter and attr usage models for inherited counters since the parent counter will always have one, which suffices to generate the needed output. This avoids up to 3 global atomic increments per inherited counter. LKML-Reference: <new-submission> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-23 11:42:46 +02:00
Peter Zijlstra	b84fbc9fb1	perf_counter: Push inherit into perf_counter_alloc() Teach perf_counter_alloc() about inheritance so that we can optimize the inherit path in the next patch. Remove the child_counter->atrr.inherit = 1 line because the only way to get there is if parent_counter->attr.inherit == 1 and we copy the attrs. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-23 11:42:45 +02:00
Peter Zijlstra	f29ac756a4	perf_counter: Optimize perf_swcounter_event() Similar to tracepoints, use an enable variable to reduce overhead when unused. Only look for a counter of a particular event type when we know there is at least one in the system. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-23 11:42:44 +02:00
Arun R Bharadwaj	bfdb4d9f0f	timers: Fix timer_migration interface which accepts any number as input Poornima Nayek reported: \| Timer migration interface /proc/sys/kernel/timer_migration in \| 2.6.30-git9 accepts any numerical value as input. \| \| Steps to reproduce: \| 1. echo -6666666 > /proc/sys/kernel/timer_migration \| 2. cat /proc/sys/kernel/timer_migration \| -6666666 \| \| 1. echo 44444444444444444444444444444444444444444444444444444444444 > /proc/sys/kernel/timer_migration \| 2. cat /proc/sys/kernel/timer_migration \| -1357789412 \| \| Expected behavior: Should 'echo: write error: Invalid argument' while \| setting any value other then 0 & 1 Restrict valid values to 0 and 1. Reported-by: Poornima Nayak <mpnayak@linux.vnet.ibm.com> Tested-by: Poornima Nayak <mpnayak@linux.vnet.ibm.com> Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Cc: poornima nayak <mpnayak@linux.vnet.ibm.com> Cc: Arun Bharadwaj <arun@linux.vnet.ibm.com> LKML-Reference: <20090623043058.GA3249@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-23 10:49:33 +02:00
Linus Torvalds	31950eb66f	mm/init: cpu_hotplug_init() must be initialized before SLAB SLAB uses get/put_online_cpus() which use a mutex which is itself only initialized when cpu_hotplug_init() is called. Currently we hang suring boot in SLAB due to doing that too late. Reported by James Bottomley and Sachin Sant (and possibly others). Debugged by Benjamin Herrenschmidt. This just removes the dynamic initialization of the data structures, and replaces it with a static one, avoiding this dependency entirely, and removing one unnecessary special initcall. Tested-by: Sachin Sant <sachinp@in.ibm.com> Tested-by: James Bottomley <James.Bottomley@HansenPartnership.com> Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:18:12 -07:00
Linus Torvalds	2453d6ff6f	Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq, irq.h: Fix kernel-doc warnings genirq: fix comment to say IRQ_WAKE_THREAD	2009-06-20 11:30:01 -07:00
Linus Torvalds	12e24f34cb	Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits) perfcounter: Handle some IO return values perf_counter: Push perf_sample_data through the swcounter code perf_counter tools: Define and use our own u64, s64 etc. definitions perf_counter: Close race in perf_lock_task_context() perf_counter, x86: Improve interactions with fast-gup perf_counter: Simplify and fix task migration counting perf_counter tools: Add a data file header perf_counter: Update userspace callchain sampling uses perf_counter: Make callchain samples extensible perf report: Filter to parent set by default perf_counter tools: Handle lost events perf_counter: Add event overlow handling fs: Provide empty .set_page_dirty() aop for anon inodes perf_counter: tools: Makefile tweaks for 64-bit powerpc perf_counter: powerpc: Add processor back-end for MPC7450 family perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels perf_counter: powerpc: Change how processor-specific back-ends get selected perf_counter: powerpc: Use unsigned long for register and constraint values perf_counter: powerpc: Enable use of software counters on 32-bit powerpc perf_counter tools: Add and use isprint() ...	2009-06-20 11:29:32 -07:00
Linus Torvalds	1eb51c33b2	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix out of scope variable access in sched_slice() sched: Hide runqueues from direct refer at source code level sched: Remove unneeded __ref tag sched, x86: Fix cpufreq + sched_clock() TSC scaling	2009-06-20 10:57:40 -07:00
Linus Torvalds	b0b7065b64	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits) tracing/urgent: warn in case of ftrace_start_up inbalance tracing/urgent: fix unbalanced ftrace_start_up function-graph: add stack frame test function-graph: disable when both x86_32 and optimize for size are configured ring-buffer: have benchmark test print to trace buffer ring-buffer: do not grab locks in nmi ring-buffer: add locks around rb_per_cpu_empty ring-buffer: check for less than two in size allocation ring-buffer: remove useless compile check for buffer_page size ring-buffer: remove useless warn on check ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index tracing: update sample event documentation tracing/filters: fix race between filter setting and module unload tracing/filters: free filter_string in destroy_preds() ring-buffer: use commit counters for commit pointer accounting ring-buffer: remove unused variable ring-buffer: have benchmark test handle discarded events ring-buffer: prevent adding write in discarded area tracing/filters: strloc should be unsigned short tracing/filters: operand can be negative ... Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually	2009-06-20 10:56:46 -07:00
Linus Torvalds	38df92b8ce	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: NOHZ: Properly feed cpufreq ondemand governor	2009-06-20 10:51:44 -07:00
Ingo Molnar	d4c4038343	Merge branch 'tip/tracing/urgent-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent	2009-06-20 18:26:48 +02:00
Ingo Molnar	3daeb4da9a	Merge branch 'tip/tracing/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent	2009-06-20 17:25:49 +02:00
Peter Zijlstra	92bf309a9c	perf_counter: Push perf_sample_data through the swcounter code Push the perf_sample_data further outwards to the swcounter interface, to abstract it away some more. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-20 12:30:30 +02:00
Frederic Weisbecker	9ea1a153a4	tracing/urgent: warn in case of ftrace_start_up inbalance Prevent from further ftrace_start_up inbalances so that we avoid future nop patching omissions with dynamic ftrace. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org>	2009-06-20 06:52:21 +02:00
Frederic Weisbecker	c85a17e226	tracing/urgent: fix unbalanced ftrace_start_up Perfcounter reports the following stats for a wide system profiling: # # (2364 samples) # # Overhead Symbol # ........ ...... # 15.40% [k] mwait_idle_with_hints 8.29% [k] read_hpet 5.75% [k] ftrace_caller 3.60% [k] ftrace_call [...] This snapshot has been taken while neither the function tracer nor the function graph tracer was running. With dynamic ftrace, such results show a wrong ftrace behaviour because all calls to ftrace_caller or ftrace_graph_caller (the patched calls to mcount) are supposed to be patched into nop if none of those tracers are running. The problem occurs after the first run of the function tracer. Once we launch it a second time, the callsites will never be nopped back, unless you set custom filters. For example it happens during the self tests at boot time. The function tracer selftest runs, and then the dynamic tracing is tested too. After that, the callsites are left un-nopped. This is because the reset callback of the function tracer tries to unregister two ftrace callbacks in once: the common function tracer and the function tracer with stack backtrace, regardless of which one is currently in use. It then creates an unbalance on ftrace_start_up value which is expected to be zero when the last ftrace callback is unregistered. When it reaches zero, the FTRACE_DISABLE_CALLS is set on the next ftrace command, triggering the patching into nop. But since it becomes unbalanced, ie becomes lower than zero, if the kernel functions are patched again (as in every further function tracer runs), they won't ever be nopped back. Note that ftrace_call and ftrace_graph_call are still patched back to ftrace_stub in the off case, but not the callers of ftrace_call and ftrace_graph_caller. It means that the tracing is well deactivated but we waste a useless call into every kernel function. This patch just unregisters the right ftrace_ops for the function tracer on its reset callback and ignores the other one which is not registered, fixing the unbalance. The problem also happens is .30 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable@kernel.org	2009-06-20 06:28:46 +02:00
Oleg Nesterov	befca96779	ptrace: wait_task_zombie: do not account traced sub-threads The bug is ancient. If we trace the sub-thread of our natural child and this sub-thread exits, we update parent->signal->cxxx fields. But we should not do this until the whole thread-group exits, otherwise we account this thread (and all other live threads) twice. Add the task_detached() check. No need to check thread_group_empty(), wait_consider_task()->delay_group_leader() already did this. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Roland McGrath <roland@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-19 16:46:06 -07:00
Peter Zijlstra	b49a9e7e72	perf_counter: Close race in perf_lock_task_context() perf_lock_task_context() is buggy because it can return a dead context. the RCU read lock in perf_lock_task_context() only guarantees the memory won't get freed, it doesn't guarantee the object is valid (in our case refcount > 0). Therefore we can return a locked object that can get freed the moment we release the rcu read lock. perf_pin_task_context() then increases the refcount and does an unlock on freed memory. That increased refcount will cause a double free, in case it started out with 0. Ammend this by including the get_ctx() functionality in perf_lock_task_context() (all users already did this later anyway), and return a NULL context when the found one is already dead. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-19 17:57:36 +02:00
Peter Zijlstra	e5289d4a18	perf_counter: Simplify and fix task migration counting The task migrations counter was causing rare and hard to decypher memory corruptions under load. After a day of debugging and bisection we found that the problem was introduced with: 3f731ca: perf_counter: Fix cpu migration counter Turning them off fixes the crashes. Incidentally, the whole perf_counter_task_migration() logic can be done simpler as well, by injecting a proper sw-counter event. This cleanup also fixed the crashes. The precise failure mode is not completely clear yet, but we are clearly not unhappy about having a fix ;-) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-19 13:43:12 +02:00
Steven Rostedt	71e308a239	function-graph: add stack frame test In case gcc does something funny with the stack frames, or the return from function code, we would like to detect that. An arch may implement passing of a variable that is unique to the function and can be saved on entering a function and can be tested when exiting the function. Usually the frame pointer can be used for this purpose. This patch also implements this for x86. Where it passes in the stack frame of the parent function, and will test that frame on exit. There was a case in x86_32 with optimize for size (-Os) where, for a few functions, gcc would align the stack frame and place a copy of the return address into it. The function graph tracer modified the copy and not the actual return address. On return from the funtion, it did not go to the tracer hook, but returned to the parent. This broke the function graph tracer, because the return of the parent (where gcc did not do this funky manipulation) returned to the location that the child function was suppose to. This caused strange kernel crashes. This test detected the problem and pointed out where the issue was. This modifies the parameters of one of the functions that the arch specific code calls, so it includes changes to arch code to accommodate the new prototype. Note, I notice that the parsic arch implements its own push_return_trace. This is now a generic function and the ftrace_push_return_trace should be used instead. This patch does not touch that code. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-18 18:40:18 -04:00
Steven Rostedt	eb4a03780d	function-graph: disable when both x86_32 and optimize for size are configured On x86_32, when optimize for size is set, gcc may align the frame pointer and make a copy of the the return address inside the stack frame. The return address that is located in the stack frame may not be the one used to return to the calling function. This will break the function graph tracer. The function graph tracer replaces the return address with a jump to a hook function that can trace the exit of the function. If it only replaces a copy, then the hook will not be called when the function returns. Worse yet, when the parent function returns, the function graph tracer will return back to the location of the child function which will easily crash the kernel with weird results. To see the problem, when i386 is compiled with -Os we get: c106be03: 57 push %edi c106be04: 8d 7c 24 08 lea 0x8(%esp),%edi c106be08: 83 e4 e0 and $0xffffffe0,%esp c106be0b: ff 77 fc pushl 0xfffffffc(%edi) c106be0e: 55 push %ebp c106be0f: 89 e5 mov %esp,%ebp c106be11: 57 push %edi c106be12: 56 push %esi c106be13: 53 push %ebx c106be14: 81 ec 8c 00 00 00 sub $0x8c,%esp c106be1a: e8 f5 57 fb ff call c1021614 <mcount> When it is compiled with -O2 instead we get: c10896f0: 55 push %ebp c10896f1: 89 e5 mov %esp,%ebp c10896f3: 83 ec 28 sub $0x28,%esp c10896f6: 89 5d f4 mov %ebx,0xfffffff4(%ebp) c10896f9: 89 75 f8 mov %esi,0xfffffff8(%ebp) c10896fc: 89 7d fc mov %edi,0xfffffffc(%ebp) c10896ff: e8 d0 08 fa ff call c1029fd4 <mcount> The compile with -Os will align the stack pointer then set up the frame pointer (%ebp), and it copies the return address back into the stack frame. The change to the return address in mcount is done to the copy and not the real place holder of the return address. Then compile with -O2 sets up the frame pointer first, this makes the change to the return address by mcount affect where the function will jump on exit. Reported-by: Jake Edge <jake@lwn.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-18 18:39:30 -04:00
Peter Oberparleiter	7bf99fb673	gcov: enable GCOV_PROFILE_ALL for x86_64 Enable gcov profiling of the entire kernel on x86_64. Required changes include disabling profiling for: * arch/kernel/acpi/realmode and arch/kernel/boot/compressed: not linked to main kernel * arch/vdso, arch/kernel/vsyscall_64 and arch/kernel/hpet: profiling causes segfaults during boot (incompatible context) Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:58 -07:00
Peter Oberparleiter	2521f2c228	gcov: add gcov profiling infrastructure Enable the use of GCC's coverage testing tool gcov [1] with the Linux kernel. gcov may be useful for: * debugging (has this code been reached at all?) * test improvement (how do I change my test to cover these lines?) * minimizing kernel configurations (do I need this option if the associated code is never run?) The profiling patch incorporates the following changes: * change kbuild to include profiling flags * provide functions needed by profiling code * present profiling data as files in debugfs Note that on some architectures, enabling gcc's profiling option "-fprofile-arcs" for the entire kernel may trigger compile/link/ run-time problems, some of which are caused by toolchain bugs and others which require adjustment of architecture code. For this reason profiling the entire kernel is initially restricted to those architectures for which it is known to work without changes. This restriction can be lifted once an architecture has been tested and found compatible with gcc's profiling. Profiling of single files or directories is still available on all platforms (see config help text). [1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.html Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:57 -07:00
Peter Oberparleiter	b99b87f70c	kernel: constructor support Call constructors (gcc-generated initcall-like functions) during kernel start and module load. Constructors are e.g. used for gcov data initialization. Disable constructor support for usermode Linux to prevent conflicts with host glibc. Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:57 -07:00
Alexey Dobriyan	90af90d7d3	nsproxy: extract create_nsproxy() clone_nsproxy() does useless copying of old nsproxy -- every pointer will be rewritten to new ns or to old ns. Remove copying, rename clone_nsproxy(), create_nsproxy() will be used by C/R code to create fresh nsproxy on restart. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:56 -07:00
Alexey Dobriyan	4c2a7e72d5	utsns: extract creeate_uts_ns() create_uts_ns() will be used by C/R to create fresh uts_ns. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:55 -07:00
Alexey Dobriyan	dca4a97960	pidns: rewrite copy_pid_ns() copy_pid_ns() is a perfect example of a case where unwinding leads to more code and makes it less clear. Watch the diffstat. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:55 -07:00
Alexey Dobriyan	ed469a63c3	pidns: make create_pid_namespace() accept parent pidns create_pid_namespace() creates everything, but caller has to assign parent pidns by hand, which is unnatural. At the moment of call new ->level has to be taken from somewhere and parent pidns is already available. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:55 -07:00
Christoph Hellwig	17f98dcf60	pids: clean up find_task_by_pid variants find_task_by_pid_type_ns is only used to implement find_task_by_vpid and find_task_by_pid_ns, but both of them pass PIDTYPE_PID as first argument. So just fold find_task_by_pid_type_ns into find_task_by_pid_ns and use find_task_by_pid_ns to implement find_task_by_vpid. While we're at it also remove the exports for find_task_by_pid_ns and find_task_by_vpid - we don't have any modular callers left as the only modular caller of he old pre pid namespace find_task_by_pid (gfs2) was switched to pid_task which operates on a struct pid pointer instead of a pid_t. Given the confusion about pid_t values vs namespace that's generally the better option anyway and I think we're better of restricting modules to do it that way. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:55 -07:00
Sukanto Ghosh	7338f29984	sysctl.c: remove unused variable Remoce the unused variable 'val' from __do_proc_dointvec() The integer has been declared and used as 'val = -val' and there is no reference to it anywhere. Signed-off-by: Sukanto Ghosh <sukanto.cse.iitb@gmail.com> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Cc: Sukanto Ghosh <sukanto.cse.iitb@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:54 -07:00
Oleg Nesterov	371cbb387e	kthreads: simplify migration_thread() exit path Now that kthread_stop() can be used even if the task has already exited, we can kill the "wait_to_die:" loop in migration_thread(). But we must pin rq->migration_thread after creation. Actually, I don't think CPU_UP_CANCELED or CPU_DEAD should wait for ->migration_thread exit. Perhaps we can simplify this code a bit more. migration_call() can set ->should_stop and forget about this thread. But we need a new helper in kthred.c for that. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:54 -07:00

1 2 3 4 5 ...

7765 Commits (4263565d491145b57621a761714f2ca6f1293a45)