Commit graph

582 commits

Author SHA1 Message Date
Ingo Molnar
3fd4bc015e tracing/kmemtrace: export kmemtrace_mark_alloc_node() / kmemtrace_mark_free()
Impact: build fix

Also fix up Kconfig dependencies and include files.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-30 16:06:00 +01:00
Frederic Weisbecker
36994e58a4 tracing/kmemtrace: normalize the raw tracer event to the unified tracing API
Impact: new tracer plugin

This patch adapts kmemtrace raw events tracing to the unified tracing API.

To enable and use this tracer, just do the following:

 echo kmemtrace > /debugfs/tracing/current_tracer
 cat /debugfs/tracing/trace

You will have the following output:

 # tracer: kmemtrace
 #
 #
 # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
 # FREE   |      |     |       |              |   |            |        |
 # |

type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584

That was to stay backward compatible with the format output produced in
inux/tracepoint.h.

This is the default ouput, but note that I tried something else.

If you change an option:

echo kmem_minimalistic > /debugfs/trace_options

and then cat /debugfs/trace, you will have the following output:

 # tracer: kmemtrace
 #
 #
 # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
 # FREE   |      |     |       |              |   |            |        |
 # |

   -      C                            0xffff88007c088780          file_free_rcu
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   -      C                            0xffff88007cad6000          putname
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   +      K    240    240   000000d0   0xffff8800790dc780     -1   d_alloc
   -      C                            0xffff88007cad6000          putname
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   +      K    240    240   000000d0   0xffff8800790dc870     -1   d_alloc
   -      C                            0xffff88007cad6000          putname
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   +      K    240    240   000000d0   0xffff8800790dc960     -1   d_alloc
   +      K   1304   1312   000000d0   0xffff8800791d7340     -1   reiserfs_alloc_inode
   -      C                            0xffff88007cad6000          putname
   +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
   -      C                            0xffff88007cad6000          putname
   +      K    992   1000   000000d0   0xffff880079045b58     -1   alloc_inode
   +      K    768   1024   000080d0   0xffff88007c096400     -1   alloc_pipe_info
   +      K    240    240   000000d0   0xffff8800790dca50     -1   d_alloc
   +      K    272    320   000080d0   0xffff88007c088780     -1   get_empty_filp
   +      K    272    320   000080d0   0xffff88007c088000     -1   get_empty_filp

Yeah I shall confess kmem_minimalistic should be: kmem_alternative.

Whatever, I find it more readable but this a personal opinion of course.
We can drop it if you want.

On the ALLOC/FREE column, + means an allocation and - a free.

On the type column, you have K = kmalloc, C = cache, P = page

I would like the flags to be GFP_* strings but that would not be easy to not
break the column with strings....

About the node...it seems to always be -1. I don't know why but that shouldn't
be difficult to find.

I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would
be more easy to find the tracer headers if they are all in their common
directory.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-30 09:36:13 +01:00
Rusty Russell
33edcf133b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-12-30 08:02:35 +10:30
Ingo Molnar
a103e2ab73 tracing/selftest: remove TRACE_CONT reference
Impact: build fix

TRACE_CONT is gone - fix up the self-test too.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29 15:07:47 +01:00
Ingo Molnar
f7d48cbde5 tracing/ftrace: make trace_find_cmdline() generally available
Impact: build fix

On !CONFIG_CONTEXT_SWITCH_TRACER trace_find_cmdline() is not defined:

 kernel/trace/trace_output.c: In function 'trace_ctxwake_print':
 kernel/trace/trace_output.c:499: error: implicit declaration of function 'trace_find_cmdline'
 kernel/trace/trace_output.c:499: warning: assignment makes pointer from integer without a cast

Move it to the generic section in trace.h.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29 13:06:24 +01:00
Frederic Weisbecker
e302cf3f96 tracing/branch-tracer: adapt to the stat tracing API
Impact: refactor the branch tracer

This patch adapts the branch tracer to the tracing API.

This is a proof of concept because the branch tracer implements two
"stat tracing" that were split in two files.

So I added an option to the branch tracer: stat_all_branch.
If it is set, then trace_stat will output all of the branches
entries stats. Otherwise, it will print the annotated branches.

Its is a kind of quick trick, waiting for a better solution.

By default, the annotated branches stat are sorted by incorrect branch
prediction percentage.

Ie:

 correct incorrect  %        Function                  File              Line
 ------- ---------  -        --------                  ----              ----
       0        1 100 native_smp_prepare_cpus        smpboot.c            1228
       0        1 100 hpet_rtc_timer_reinit          hpet.c               1057
       0    18032 100 sched_info_queued              sched_stats.h        223
       0      684 100 yield_task_fair                sched_fair.c         984
       0      282 100 pre_schedule_rt                sched_rt.c           1263
       0    13414 100 sched_info_dequeued            sched_stats.h        178
       0    21724 100 sched_info_switch              sched_stats.h        270
       0        1 100 get_signal_to_deliver          signal.c             1820
       0        8 100 __cancel_work_timer            workqueue.c          560
       0      212 100 verify_export_symbols          module.c             1509
       0       17 100 __rmqueue_fallback             page_alloc.c         793
       0       43 100 clear_page_mlock               internal.h           129
       0      124 100 try_to_unmap_anon              rmap.c               1021
       0       53 100 try_to_unmap_anon              rmap.c               1013
       0        6 100 vma_address                    rmap.c               232
       0     3301 100 try_to_unmap_file              rmap.c               1082
       0      466 100 try_to_unmap_file              rmap.c               1077
       0        1 100 mem_cgroup_create              memcontrol.c         1090
       0        3 100 inotify_find_update_watch      inotify.c            726
       2    30163  99 perf_counter_task_sched_out    perf_counter.c       385
       1     2935  99 percpu_free                    allocpercpu.c        138
    1544   297672  99 dentry_lru_del_init            dcache.c             153
       8     1074  99 input_pass_event               input.c              86
    1390    76781  98 mapping_unevictable            pagemap.h            50
     280     6665  95 pick_next_task_rt              sched_rt.c           889
     750     4826  86 next_pidmap                    pid.c                194
       2        8  80 blocking_notifier_chain_regist notifier.c           220
      36      130  78 ioremap_pte_range              ioremap.c            22
    1093     3247  74 IS_ERR                         err.h                34
    1023     2908  73 sched_slice                    sched_fair.c         445
      22       60  73 disk_put_part                  genhd.h              206
[...]

It enables a developer to quickly address the source of incorrect branch
predictions.  Note that this sorting would be better with a second sort on
the number of incorrect predictions.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29 12:55:46 +01:00
Frederic Weisbecker
dbd0b4b330 tracing/ftrace: provide the base infrastructure for histogram tracing
Impact: extend the tracing API

The goal of this patch is to normalize and make more easy the
implementation of statistical (histogram) tracing.

It implements a trace_stat file into the /debugfs/tracing directory where
one can print a one-shot output of statistics/histogram entries.

A tracer has to provide two basic iterator callbacks:

  stat_start() => the first entry
  stat_next(prev, idx) => the next one.

Note that it is adapted for arrays or hash tables or lists.... since it
provides a pointer to the previous entry and the current index of the
iterator.

These two callbacks are called to get a snapshot of the statistics at each
opening of the trace_stat file because. The values are so updated between
two "cat trace_stat". And the tracer is free to lock its datas during the
iteration to keep consistent values.

Since it is almost always interesting to sort statisticals values to
address the problems by priority, this infrastructure provides a "sorting"
of the stat entries too if desired. A tracer has just to provide a
stat_cmp callback to compare two entries and the stat tracing
infrastructure will build a sorted list of the given entries.

A last callback, called stat_headers, can be implemented by a tracer to
output headers on its trace.

If one of these callbacks is changed on runtime, it just have to signal it
to the stat tracing API by calling the init_tracer_stat() helper.

Changes in V2:

- Fix a memory leak if the user opens multiple times the trace_stat file
  without closing it. Now we always free our list before rebuilding it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29 12:55:45 +01:00
Steven Rostedt
f633cef020 ftrace: change trace.c to use registered events
Impact: rework trace.c to use new event register API

Almost every ftrace event has to implement its output display in
trace.c through a different function. Some events did not handle
all the formats (trace, latency-trace, raw, hex, binary), and
this method does not scale well.

This patch converts the format functions to use the event API to
find the event and and print its format. Currently, we have
a print function for trace, latency_trace, raw, hex and binary.
A trace_nop_print is available if the event wants to avoid output
on a particular format.

Perhaps other tracers could use this in the future (like mmiotrace and
function_graph).

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29 12:46:12 +01:00
Steven Rostedt
f0868d1e23 ftrace: set up trace event hash infrastructure
Impact: simplify/generalize/refactor trace.c

The trace.c file is becoming more difficult to maintain due to the
growing number of events. There is several formats that an event may
be printed. This patch sets up the infrastructure of an event hash to
allow for events to register how they should be printed.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29 12:46:11 +01:00
Steven Rostedt
c47956d9ae ftrace: remove obsolete print continue functionality
Impact: cleanup, remove obsolete code

Now that the ring buffer used by ftrace allows for variable length
entries, we do not need the 'cont' feature of the buffer.  This code
makes other parts of ftrace more complex and by removing this it
simplifies the ftrace code.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29 12:46:10 +01:00
Ingo Molnar
0f01f07fad Merge branches 'tracing/docs', 'tracing/function-graph-tracer' and 'linus' into tracing/core 2008-12-29 11:25:11 +01:00
Linus Torvalds
b0f4b285d7 Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits)
  sched, trace: update trace_sched_wakeup()
  tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3
  Revert "x86: disable X86_PTRACE_BTS"
  ring-buffer: prevent false positive warning
  ring-buffer: fix dangling commit race
  ftrace: enable format arguments checking
  x86, bts: memory accounting
  x86, bts: add fork and exit handling
  ftrace: introduce tracing_reset_online_cpus() helper
  tracing: fix warnings in kernel/trace/trace_sched_switch.c
  tracing: fix warning in kernel/trace/trace.c
  tracing/ring-buffer: remove unused ring_buffer size
  trace: fix task state printout
  ftrace: add not to regex on filtering functions
  trace: better use of stack_trace_enabled for boot up code
  trace: add a way to enable or disable the stack tracer
  x86: entry_64 - introduce FTRACE_ frame macro v2
  tracing/ftrace: add the printk-msg-only option
  tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()
  x86, bts: correctly report invalid bts records
  ...

Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits
being already partly merged by the SH merge.
2008-12-28 12:21:10 -08:00
Frederic Weisbecker
412d0bb553 tracing/function-graph-tracer: strip ending newlines on comments
Impact: tracer output improvement

Ending newlines are appended automatically on comments by the function
graph tracer because the newline needs to be placed after the "*/"
comment characters.

So if the user puts an ending newline, we want to strip it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-26 10:42:14 +01:00
Ingo Molnar
32e8d18683 Merge branches 'timers/clocksource', 'timers/hpet', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/rtc' into timers/core 2008-12-25 18:02:25 +01:00
Ingo Molnar
5250d329e3 Merge branches 'tracing/ftrace', 'tracing/hw-branch-tracing' and 'tracing/ring-buffer'; commit 'v2.6.28' into tracing/core 2008-12-25 13:11:00 +01:00
Peter Zijlstra
468a15bb4c sched, trace: update trace_sched_wakeup()
Impact: extend the wakeup tracepoint with the info whether the wakeup was real

Add the information needed to distinguish 'real' wakeups from 'false'
wakeups.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-25 13:10:21 +01:00
Steven Rostedt
98db8df777 ring-buffer: prevent false positive warning
Impact: eliminate false WARN_ON message

If an interrupt goes off after the setting of the local variable
tail_page and before incrementing the write index of that page,
the interrupt could push the commit forward to the next page.

Later a check is made to see if interrupts pushed the buffer around
the entire ring buffer by comparing the next page to the last commited
page. This can produce a false positive if the interrupt had pushed
the commit page forward as stated above.

Thanks to Jiaying Zhang for finding this race.

Reported-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-23 18:45:26 +01:00
Steven Rostedt
a8ccf1d6f6 ring-buffer: fix dangling commit race
Impact: fix stuck trace-buffers

If an interrupt comes in during the rb_set_commit_to_write and
pushes the tail page forward just at the right time, the commit
updates will miss the adding of the interrupt data. This will
cause the commit pointer to cease from moving forward.

Thanks to Jiaying Zhang for finding this race.

Reported-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-23 18:45:25 +01:00
Pekka J Enberg
213cc06079 ftrace: introduce tracing_reset_online_cpus() helper
Impact: cleanup

This patch factors out common code from multiple tracers into a
tracing_reset_online_cpus() function and converts the tracers to use it.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-19 16:29:34 +01:00
Ingo Molnar
30cd324e97 Merge branches 'tracing/ftrace', 'tracing/ring-buffer' and 'tracing/urgent' into tracing/core
Conflicts:
	include/linux/ftrace.h
2008-12-19 09:42:40 +01:00
Ingo Molnar
c71dd42db2 tracing: fix warnings in kernel/trace/trace_sched_switch.c
these warnings:

  kernel/trace/trace_sched_switch.c: In function ‘tracing_sched_register’:
  kernel/trace/trace_sched_switch.c:96: warning: passing argument 1 of ‘register_trace_sched_wakeup_new’ from incompatible pointer type
  kernel/trace/trace_sched_switch.c:112: warning: passing argument 1 of ‘unregister_trace_sched_wakeup_new’ from incompatible pointer type
  kernel/trace/trace_sched_switch.c: In function ‘tracing_sched_unregister’:
  kernel/trace/trace_sched_switch.c:121: warning: passing argument 1 of ‘unregister_trace_sched_wakeup_new’ from incompatible pointer type

Trigger because sched_wakeup_new tracepoints need the same trace
signature as sched_wakeup - which was changed recently.

Fix it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-19 01:05:38 +01:00
Ingo Molnar
3bddb9a324 tracing: fix warning in kernel/trace/trace.c
this warning:

  kernel/trace/trace.c: In function ‘print_lat_fmt’:
  kernel/trace/trace.c:1826: warning: unused variable ‘state’

Triggers because 'state' has become unused - remove it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-19 01:01:25 +01:00
Lai Jiangshan
6d102bc68f tracing/ring-buffer: remove unused ring_buffer size
Impact: remove dead code

struct ring_buffer.size is not set after ring_buffer is initialized
or resized. it is always 0.

we can use "buffer->pages * PAGE_SIZE" to get ring_buffer's size

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 13:49:56 +01:00
Thomas Gleixner
3d9101e925 trace: fix task state printout
Impact: fix occasionally incorrect trace output

The tracing code has interesting varieties of printing out task state.

Unfortunalely only one of the instances is correct as it copies the
code from sched.c:sched_show_task(). The others are plain wrong as
they treatthe bitfield as an integer offset into the character
array. Also the size check of the character array is wrong as it
includes the trailing \0.

Use a common state decoder inline which does the Right Thing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 13:03:09 +01:00
Steven Rostedt
ea3a6d6d60 ftrace: add not to regex on filtering functions
Impact: enhancement

Ingo Molnar has asked about a way to remove items from the filter
lists. Currently, you can only add or replace items. The way
items are added to the list is through opening one of the list
files (set_ftrace_filter or set_ftrace_notrace) via append.
If the file is opened for truncate, the list is cleared.

  echo spin_lock > /debug/tracing/set_ftrace_filter

The above will replace the list with only spin_lock

  echo spin_lock >> /debug/tracing/set_ftrace_filter

The above will add spin_lock to the list.

Now this patch adds:

  echo '!spin_lock' >> /debug/tracing/set_ftrace_filter

This will remove spin_lock from the list.

The limited glob features of these lists also can be notted.

  echo '!spin_*' >> /debug/tracing/set_ftrace_filter

This will remove all functions that start with 'spin_'

Note:

  echo '!spin_*' > /debug/tracing/set_ftrace_filter

will simply clear out the list (notice the '>' instead of '>>')

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 12:57:09 +01:00
Steven Rostedt
e05a43b744 trace: better use of stack_trace_enabled for boot up code
Impact: clean up

Andrew Morton suggested to use the stack_tracer_enabled variable
to decide whether or not to start stack tracing on bootup.
This lets us remove the start_stack_trace variable.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 12:56:56 +01:00
Steven Rostedt
f38f1d2aa5 trace: add a way to enable or disable the stack tracer
Impact: enhancement to stack tracer

The stack tracer currently is either on when configured in or
off when it is not. It can not be disabled when it is configured on.
(besides disabling the function tracer that it uses)

This patch adds a way to enable or disable the stack tracer at
run time. It defaults off on bootup, but a kernel parameter 'stacktrace'
has been added to enable it on bootup.

A new sysctl has been added "kernel.stack_tracer_enabled" to let
the user enable or disable the stack tracer at run time.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 12:56:24 +01:00
Frederic Weisbecker
66896a85cf tracing/ftrace: add the printk-msg-only option
Impact: display ftrace_printk messages "as is"

By default, ftrace_printk() messages find their output with some other
informations like pid, caller, ...
Sometimes a developer just want to have the ftrace_printk left "as is", without
other information.

This is done by providing a default-off option called printk-msg-only.
To enable it, just do `echo printk-msg-only > /debugfs/tracing/trace_options`

Before the patch:

           <...>-2739  [000]   145.692153: __might_sleep: I'm an ftrace_printk msg in __might_sleep
           <...>-2739  [000]   145.692155: __might_sleep: I'm another ftrace_printk msg in __might_sleep

After the patch and the printk-msg-only option enabled:

I'm an ftrace_printk msg in __might_sleep
I'm another ftrace_printk msg in __might_sleep

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-17 00:26:36 +01:00
Frederic Weisbecker
2c2d7329d8 tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()
Impact: prevent a trace recursion

After some tests with function graph tracer under x86-32, I saw some recursions
caused by ring_buffer_time_stamp() that calls preempt_enable_no_notrace() which
calls preempt_schedule() which is traced itself.

This patch re-enables preemption without rescheduling.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-17 00:26:35 +01:00
Ingo Molnar
9dfc3bc7d2 Merge branches 'tracing/fastboot', 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/hw-branch-tracing' into tracing/core 2008-12-16 12:03:38 +01:00
Rusty Russell
29c0177e6a cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.
Impact: change calling convention of existing cpumask APIs

Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.

These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com
2008-12-13 21:20:25 +10:30
Frederic Weisbecker
f8b755ac8e tracing/function-graph-tracer: Output arrows signal on hardirq call/return
Impact: make more obvious the hardirq calls in the output

When a hardirq is triggered inside the codeflow on output, we have
now two arrows that indicate the entry and return of the hardirq.

 0)               |          bit_waitqueue() {
 0)   0.880 us    |            __phys_addr();
 0)   2.699 us    |          }
 0)               |          __wake_up_bit() {
 0)   ==========> |          smp_apic_timer_interrupt() {
 0)   0.797 us    |            native_apic_mem_write();
 0)   0.715 us    |            exit_idle();
 0)               |            irq_enter() {
 0)   0.722 us    |              idle_cpu();
 0)   5.519 us    |            }
 0)               |            hrtimer_interrupt() {
 0)               |              ktime_get() {
 0)               |                ktime_get_ts() {
 0)   0.805 us    |                  getnstimeofday();

 [...]

 0) ! 108.528 us  |            }
 0)               |            irq_exit() {
 0)               |              do_softirq() {
 0)               |                __do_softirq() {
 0)   0.895 us    |                  __local_bh_disable();
 0)               |                  run_timer_softirq() {
 0)   0.827 us    |                    hrtimer_run_pending();
 0)   1.226 us    |                    _spin_lock_irq();
 0)               |                    _spin_unlock_irq() {
 0)   6.550 us    |                  }
 0)   0.924 us    |                  _local_bh_enable();
 0) + 12.129 us   |                }
 0) + 13.911 us   |              }
 0)   0.707 us    |              idle_cpu();
 0) + 17.009 us   |            }
 0) ! 137.419 us  |          }
 0)   <========== |
 0)   1.045 us    |          }
 0) ! 148.908 us  |        }
 0) ! 151.022 us  |      }
 0) ! 153.022 us  |    }
 0)   0.963 us    |    journal_mark_dirty();
 0)   0.925 us    |    __brelse();

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 11:14:09 +01:00
Markus Metzger
a93751cab7 x86, bts, ftrace: adapt the hw-branch-tracer to the ds.c interface
Impact: restructure code, cleanup

Remove BTS bits from the hw-branch-tracer (renamed from bts-tracer) and
use the ds interface.

Signed-off-by: Markus Metzger <markut.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 08:08:14 +01:00
Robert Richter
c4f50183f9 ring_buffer: adding EXPORT_SYMBOLs
I added EXPORT_SYMBOL_GPLs for all functions part of the API
(ring_buffer.h). This is required since oprofile is using the ring
buffer and the compilation as modules would fail otherwise.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 06:54:55 +01:00
Robert Richter
e2ac8ef576 ftrace: remove unused function arg in trace_iterator_increment()
This removes the unused cpu function parameter.

Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-12-10 14:20:12 +01:00
Robert Richter
68814b58c5 ring_buffer: update description for ring_buffer_alloc()
Trivial patch.

Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-12-10 14:20:11 +01:00
Ingo Molnar
e726f5f91e tracing/function-graph-tracer: fix 'flags' variable mismatch
this warning:

 kernel/trace/trace.c: In function ‘trace_vprintk’:
 kernel/trace/trace.c:3626: warning: ‘flags’ may be used uninitialized in this function

shows some confusion about irq_flags / flags use here. We already have
irq_flags so remove the extra flags variable.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 16:55:53 +01:00
Frederic Weisbecker
380c4b1411 tracing/function-graph-tracer: append the tracing_graph_flag
Impact: Provide a way to pause the function graph tracer

As suggested by Steven Rostedt, the previous patch that prevented from
spinlock function tracing shouldn't use the raw_spinlock to fix it.
It's much better to follow lockdep with normal spinlock, so this patch
adds a new flag for each task to make the function graph tracer able
to be paused. We also can send an ftrace_printk whithout worrying of
the irrelevant traced spinlock during insertion.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 15:11:45 +01:00
Frederic Weisbecker
8e1b82e086 tracing/function-graph-tracer: turn tracing_selftest_running into an int
Impact: cleanup

Apply some suggestions of Steven Rostedt:

_turn tracing_selftest_running into a simple int (no need of an atomic_t)
_set it __read_mostly
_fix a comment style

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 15:11:44 +01:00
Frederic Weisbecker
decbec3838 tracing/function-graph-tracer: implement a print_headers function
Impact: provide trace headers to explain a bit the output

This patch implements the print_headers callback for the function graph
tracer. These headers are output according to the current trace options.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 13:24:52 +01:00
Steven Rostedt
21bbecdaae ftrace: use init_struct_pid as swapper pid
Impact: clean up

Using (struct pid *)-1 as the pointer for ftrace_swapper_pid is
a little confusing for others. This patch uses the address of the
actual init pid structure instead. This change is only for
clarity. It does not affect the code itself. Hopefully soon the
swapper tasks will all have their own pid structure and then
we can clean up the code a bit more.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-05 14:51:29 +01:00
Frederic Weisbecker
21a8c466f9 tracing/ftrace: provide the macro task_curr_ret_stack()
Impact: cleanup

As suggested by Steven Rostedt, this patch provide a new macro
task_curr_ret_stack() to move the cpp conditionnal CONFIG into
the linux/ftrace.h headers.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-05 14:47:44 +01:00
Frederic Weisbecker
77d683f3e0 tracing/ftrace: fix the check of ftrace_trace_task
Impact: fix default empty traces on function-graph-tracer

The actual ftrace_trace_task() checks if ftrace_pid_trace is allocated
and return 1 if it is true.
If it is NULL, it will check the bit of pid tracing flag for the current
task (which are not set by default).
So by default, a task is not traced.
Actually all tasks should be traced by default and filter_by_pid when
ftrace_pid_trace is allocated.

The appropriate condition should be to return 1 if filter_by_pid is
set.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acke-dby: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-05 14:47:43 +01:00
Frederic Weisbecker
ff32504fdc tracing/ftrace: don't insert TRACE_PRINT during selftests
Impact: fix tracer selfstests false results

After setting a ftrace_printk somewhere in th kernel, I saw the
Function tracer selftest failing.

When a selftest occurs, the ring buffer is lurked to see if
some entries were inserted. But concurrent insertion such as
ftrace_printk could occured at the same time and could give
false positive or negative results.

This patch prevent prevent from TRACE_PRINT entries insertion
during selftests.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-05 14:47:43 +01:00
Ingo Molnar
970987beb9 Merge branches 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/urgent' into tracing/core 2008-12-05 14:45:22 +01:00
Frederic Weisbecker
1fd8f2a3f9 tracing/function-graph-tracer: handle ftrace_printk entries
Handle the TRACE_PRINT entries from the function grapg tracer
and output them as a C comment just below the function that called
it, as if it was a comment inside this function.

Example with an ftrace_printk inside might_sleep() function:

void __might_sleep(char *file, int line)
{
	static unsigned long prev_jiffy;	/* ratelimiting */

	ftrace_printk("Hi I'm a comment in might_sleep() :-)");

A chunk of a resulting trace:

 0)               |        _reiserfs_free_block() {
 0)               |          reiserfs_read_bitmap_block() {
 0)               |            __bread() {
 0)               |              __getblk() {
 0)               |                __find_get_block() {
 0)   0.698 us    |                  mark_page_accessed();
 0)   2.267 us    |                }
 0)               |                __might_sleep() {
 0)               |                  /* Hi I'm a comment in might_sleep() :-) */
 0)   1.321 us    |                }
 0)   5.872 us    |              }
 0)   7.313 us    |            }
 0)   8.718 us    |          }

And this patch brings two minor fixes:

- The newline after a switch-out task has disappeared
- The "|" sign just before the cpu number on task-switch has been deleted.

 0)   0.616 us    |                pick_next_task_rt();
 0)   1.457 us    |                _spin_trylock();
 0)   0.653 us    |                _spin_unlock();
 0)   0.728 us    |                _spin_trylock();
 0)   0.631 us    |                _spin_unlock();
 0)   0.729 us    |                native_load_sp0();
 0)   0.593 us    |                native_load_tls();
 ------------------------------------------
 0)    cat-2834    =>   migrati-3
 ------------------------------------------

 0)               |    finish_task_switch() {
 0)   0.841 us    |      _spin_unlock_irq();
 0)   0.616 us    |      post_schedule_rt();
 0)   3.882 us    |    }

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-04 10:18:39 +01:00
Liming Wang
faec2ec505 ftrace: avoid duplicated function when writing set_graph_function
Impact: fix a bug in function filter setting

when writing function to set_graph_function, we should check whether it
has existed in set_graph_function to avoid duplicating.

Signed-off-by: Liming Wang <liming.wang@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-04 09:42:35 +01:00
Ingo Molnar
6b2539302b tracing: fix typo and missing inline function
Impact: fix build bugs

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-04 09:33:01 +01:00
Steven Rostedt
e32d895691 ftrace: add ability to only trace swapper tasks
Impact: new feature

This patch lets the swapper tasks of all CPUS be filtered by the
set_ftrace_pid file.

If '0' is echoed into this file, then all the idle tasks (aka swapper)
is flagged to be traced.  This affects all CPU idle tasks.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-04 09:09:38 +01:00
Steven Rostedt
978f3a45d9 ftrace: use struct pid
Impact: clean up, extend PID filtering to PID namespaces

Eric Biederman suggested using the struct pid for filtering on
pids in the kernel. This patch is based off of a demonstration
of an implementation that Eric sent me in an email.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-04 09:09:37 +01:00
Steven Rostedt
804a685162 ftrace: trace single pid for function graph tracer
Impact: New feature

This patch makes the changes to set_ftrace_pid apply to the function
graph tracer.

  # echo $$ > /debugfs/tracing/set_ftrace_pid
  # echo function_graph > /debugfs/tracing/current_tracer

Will cause only the current task to be traced. Note, the trace flags are
also inherited by child processes, so the children of the shell
will also be traced.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-04 09:09:36 +01:00
Steven Rostedt
0ef8cde56a ftrace: use task struct trace flag to filter on pid
Impact: clean up

Use the new task struct trace flags to determine if a process should be
traced or not.

Note: this moves the searching of the pid to the slow path of setting
the pid field. This needs to be converted to the pid name space.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-04 09:09:35 +01:00
Steven Rostedt
ea4e2bc4d9 ftrace: graph of a single function
This patch adds the file:

   /debugfs/tracing/set_graph_function

which can be used along with the function graph tracer.

When this file is empty, the function graph tracer will act as
usual. When the file has a function in it, the function graph
tracer will only trace that function.

For example:

 # echo blk_unplug > /debugfs/tracing/set_graph_function
 # cat /debugfs/tracing/trace
 [...]
 ------------------------------------------
 | 2)  make-19003  =>  kjournald-2219
 ------------------------------------------

 2)               |  blk_unplug() {
 2)               |    dm_unplug_all() {
 2)               |      dm_get_table() {
 2)      1.381 us |        _read_lock();
 2)      0.911 us |        dm_table_get();
 2)      1. 76 us |        _read_unlock();
 2) +   12.912 us |      }
 2)               |      dm_table_unplug_all() {
 2)               |        blk_unplug() {
 2)      0.778 us |          generic_unplug_device();
 2)      2.409 us |        }
 2)      5.992 us |      }
 2)      0.813 us |      dm_table_put();
 2) +   29. 90 us |    }
 2) +   34.532 us |  }

You can add up to 32 functions into this file. Currently we limit it
to 32, but this may change with later improvements.

To add another function, use the append '>>':

  # echo sys_read >> /debugfs/tracing/set_graph_function
  # cat /debugfs/tracing/set_graph_function
  blk_unplug
  sys_read

Using the '>' will clear out the function and write anew:

  # echo sys_write > /debug/tracing/set_graph_function
  # cat /debug/tracing/set_graph_function
  sys_write

Note, if you have function graph running while doing this, the small
time between clearing it and updating it will cause the graph to
record all functions. This should not be an issue because after
it sets the filter, only those functions will be recorded from then on.
If you need to only record a particular function then set this
file first before starting the function graph tracer. In the future
this side effect may be corrected.

The set_graph_function file is similar to the set_ftrace_filter but
it does not take wild cards nor does it allow for more than one
function to be set with a single write. There is no technical reason why
this is the case, I just do not have the time yet to implement that.

Note, dynamic ftrace must be enabled for this to appear because it
uses the dynamic ftrace records to match the name to the mcount
call sites.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-04 09:09:34 +01:00
Ingo Molnar
b29144c317 Merge branches 'tracing/ftrace' and 'tracing/function-graph-tracer' into tracing/core 2008-12-04 09:07:44 +01:00
James Morris
ec98ce480a Merge branch 'master' into next
Conflicts:
	fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris <jmorris@namei.org>
2008-12-04 17:16:36 +11:00
Steven Rostedt
0a37119d96 trace: fix output of stack trace
Impact: fix to output of stack trace

If a function is not found in the stack of the stack tracer, the
number printed is quite strange. This fixes the algorithm to handle
missing functions better.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 17:15:02 +01:00
Ingo Molnar
764f3b9513 tracing/function-graph-tracer: enabled by default
CONFIG_FUNCTION_GRAPH_TRACER depends on FUNCTION_TRACER already,
(turning it non-default) so it so making it default-n is pointless.

So enable it by default - it's a nice extension of the function tracer.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 10:33:58 +01:00
Frederic Weisbecker
166d3c7994 tracing/function-graph-tracer: improve duration output
Impact: better trace output of duration for long calls

The old duration output didn't exceeded 9999.999 us to fit the column
and the nanosecs were always 3 numbers. As Ingo suggested, it's better
to have the whole microseconds elapsed time and shift the nanosecs precision
if needed to fit the maximum 7 numbers. And usec need more number, the case
should be rare and important enough to break a bit the column alignment to
show it.

So, depending of the duration value, we now have these patterns:

    u.nnn us
   uu.nnn us
  uuu.nnn us
 uuuu.nnn us
 uuuuu.nn us
 uuuuuu.n us
 uuuuuuuu..... us

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 09:09:42 +01:00
Frederic Weisbecker
11e84acc40 tracing/function-graph-tracer: display unified style cmdline and pid
Impact: extend function-graph output: let one know which thread called a function

This patch implements a helper function to print the couple cmdline/pid.
Its output is provided during task switching and on each row if the new
"funcgraph-proc" defualt-off option is set through trace_options file.

The output is center aligned and never exceeds 14 characters. The cmdline
is truncated over 7 chars.
But note that if the pid exceeds 6 characters, the column will overflow (but
the situation is abnormal).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 09:09:41 +01:00
Steven Rostedt
e49dc19c6a ftrace: function graph return for function entry
Impact: feature, let entry function decide to trace or not

This patch lets the graph tracer entry function decide if the tracing
should be done at the end as well. This requires all function graph
entry functions return 1 if it should trace, or 0 if the return should
not be traced.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:56:26 +01:00
Steven Rostedt
044fa782eb ring-buffer: change "page" variable names to "bpage"
Impact: clean up

Andrew Morton pointed out that the kernel convention of a variable
named page should be of type page struct. The ring buffer uses
a variable named "page" for a pointer to something else.

This patch converts those to be called "bpage" (as in "buffer page").

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:56:24 +01:00
Steven Rostedt
14a866c567 ftrace: add ftrace_graph_stop()
Impact: new ftrace_graph_stop function

While developing more features of function graph, I hit a bug that
caused the WARN_ON to trigger in the prepare_ftrace_return function.
Well, it was hard for me to find out that was happening because the
bug would not print, it would just cause a hard lockup or reboot.
The reason is that it is not safe to call printk from this function.

Looking further, I also found that it calls unregister_ftrace_graph,
which grabs a mutex and calls kstop machine. This would definitely
lock the box up if it were to trigger.

This patch adds a fast and safe ftrace_graph_stop() which will
stop the function tracer. Then it is safe to call the WARN ON.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:56:23 +01:00
Steven Rostedt
8789a9e7df ring-buffer: read page interface
Impact: new API to ring buffer

This patch adds a new interface into the ring buffer that allows a
page to be read from the ring buffer on a given CPU. For every page
read, one must also be given to allow for a "swap" of the pages.

 rpage = ring_buffer_alloc_read_page(buffer);
 if (!rpage)
	goto err;
 ret = ring_buffer_read_page(buffer, &rpage, cpu, full);
 if (!ret)
	goto empty;
 process_page(rpage);
 ring_buffer_free_read_page(rpage);

The caller of these functions must handle any waits that are
needed to wait for new data. The ring_buffer_read_page will simply
return 0 if there is no data, or if "full" is set and the writer
is still on the current page.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:56:21 +01:00
Steven Rostedt
abc9b56d66 ring-buffer: move some metadata into buffer page
Impact: get ready for splice changes

This patch moves the commit and timestamp into the beginning of each
data page of the buffer. This change will allow the page to be moved
to another location (disk, network, etc) and still have information
in the page to be able to read it.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:56:20 +01:00
Steven Rostedt
a5e25883a4 ftrace: replace raw_local_irq_save with local_irq_save
Impact: fix for lockdep and ftrace

The raw_local_irq_save/restore confuses lockdep. This patch
converts them to the local_irq_save/restore variants.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:56:19 +01:00
Ingo Molnar
f0461d0146 Merge branches 'tracing/ftrace' and 'tracing/function-graph-tracer' into tracing/core 2008-12-03 08:49:21 +01:00
Frederic Weisbecker
48d68b20d0 tracing/function-graph-tracer: support for x86-64
Impact: extend and enable the function graph tracer to 64-bit x86

This patch implements the support for function graph tracer under x86-64.
Both static and dynamic tracing are supported.

This causes some small CPP conditional asm on arch/x86/kernel/ftrace.c I
wanted to use probe_kernel_read/write to make the return address
saving/patching code more generic but it causes tracing recursion.

That would be perhaps useful to implement a notrace version of these
function for other archs ports.

Note that arch/x86/process_64.c is not traced, as in X86-32. I first
thought __switch_to() was responsible of crashes during tracing because I
believed current task were changed inside but that's actually not the
case (actually yes, but not the "current" pointer).

So I will have to investigate to find the functions that harm here, to
enable tracing of the other functions inside (but there is no issue at
this time, while process_64.c stays out of -pg flags).

A little possible race condition is fixed inside this patch too. When the
tracer allocate a return stack dynamically, the current depth is not
initialized before but after. An interrupt could occur at this time and,
after seeing that the return stack is allocated, the tracer could try to
trace it with a random uninitialized depth. It's a prevention, even if I
hadn't problems with it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-02 09:47:48 +01:00
Liming Wang
66eafebc10 function trace: fix a bug of single thread function trace
Impact: fix "no output from tracer" bug caused by ftrace_update_pid_func()

When disabling single thread function trace using
"echo -1 > set_ftrace_pid", the normal function trace
has to restore to original function, otherwise the normal
function trace will not work well.

Without this commit, something like below:

	$ ps |grep 850
	  850 root      2556 S    -/bin/sh
	$ echo 850 > /debug/tracing/set_ftrace_pid
	$ echo function > /debug/tracing/current_tracer
	$ echo 1 > /debug/tracing/tracing_enabled
	$ sleep 1
	$ echo 0 > /debug/tracing/tracing_enabled
	$ cat /debug/tracing/trace_pipe |wc -l
	59704
	$ echo -1 > /debug/tracing/set_ftrace_pid
	$ echo 1 > /debug/tracing/tracing_enabled
	$ sleep 1
	$ echo 0 > /debug/tracing/tracing_enabled
	$ more /debug/tracing/trace_pipe
		<====== nothing output now!
			it should output trace record.

Signed-off-by: Liming Wang <liming.wang@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-02 09:23:24 +01:00
Ingo Molnar
222658e08f Merge branches 'tracing/branch-tracer', 'tracing/ftrace', 'tracing/function-graph-tracer', 'tracing/markers', 'tracing/powerpc', 'tracing/stack-tracer' and 'tracing/tracepoints' into tracing/core 2008-12-02 09:20:44 +01:00
Frederic Weisbecker
65c6dc6adb tracing/branch-tracer: include missing irqflags.h
Impact: fix build error on branch tracer

This should fix a build error reported on alpha in linux-next:

 CC      kernel/trace/trace_branch.o
  kernel/trace/trace_branch.c: In function 'probe_likely_condition':
  kernel/trace/trace_branch.c:44: error: implicit declaration of function 'raw_local_irq_save'
  kernel/trace/trace_branch.c:76: error: implicit declaration of function 'raw_local_irq_restore'

Unfortunately, I can't test it since I don't have any Alpha build environment.

Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-29 10:11:54 +01:00
Liming Wang
50cdaf08a8 ftrace: improve seq_operation of ftrace
Impact: make ftrace position computing more sane

First remove useless ->pos field. Then we needn't check seq_printf
in .show like other place.

Signed-off-by: Liming Wang <liming.wang@windriver.com>
Reviewed-by: Bruce Ashfield <bruce.ashfield@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-28 12:30:40 +01:00
Török Edwin
c7425acb42 tracing, alpha: fix build: add missing #ifdef CONFIG_STACKTRACE
There are architectures that still have no stacktrace support.

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-28 11:33:00 +01:00
Ingo Molnar
d51090b346 tracing/function-graph-tracer: more output tweaks
Impact: prettify the output some more

Before:

0)           |     sys_read() {
0)      0.796 us |   fget_light();
0)           |       vfs_read() {
0)           |         rw_verify_area() {
0)           |           security_file_permission() {
------------8<---------- thread sshd-1755 ------------8<----------

After:

 0)               |  sys_read() {
 0)      0.796 us |    fget_light();
 0)               |    vfs_read() {
 0)               |      rw_verify_area() {
 0)               |        security_file_permission() {
 ------------------------------------------
 | 1)  migration/0--1  =>  sshd-1755
 ------------------------------------------

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-28 10:15:06 +01:00
Frederic Weisbecker
1a056155ed tracing/function-graph-tracer: adjustments of the trace informations
Impact: increase the visual qualities of the call-graph-tracer output

This patch applies various trace output formatting changes:

 - CPU is now a decimal number, followed by a parenthesis.

 - Overhead is now on the second column (gives a good visibility)

 - Cost is now on the third column, can't exceed 9999.99 us. It is
   followed by a virtual line based on a "|" character.

 - Functions calls are now the last column on the right. This way, we
   haven't dynamic column (which flow is harder to follow) on its right.

 - CPU and Overhead have their own option flag. They are default-on but you
   can disable them easily:

      echo nofuncgraph-cpu > trace_options
      echo nofuncgraph-overhead > trace_options

TODO:

_ Refactoring of the thread switch output.
_ Give a default-off option to output the thread and its pid on each row.
_ Provide headers
_ ....

Here is an example of the new trace style:

0)           |             mutex_unlock() {
0)      0.639 us |           __mutex_unlock_slowpath();
0)      1.607 us |         }
0)           |             remove_wait_queue() {
0)      0.616 us |           _spin_lock_irqsave();
0)      0.616 us |           _spin_unlock_irqrestore();
0)      2.779 us |         }
0)      0.495 us |         n_tty_set_room();
0) ! 9999.999 us |       }
0)           |           tty_ldisc_deref() {
0)      0.615 us |         _spin_lock_irqsave();
0)      0.616 us |         _spin_unlock_irqrestore();
0)      2.793 us |       }
0)           |           current_fs_time() {
0)      0.488 us |         current_kernel_time();
0)      0.495 us |         timespec_trunc();
0)      2.486 us |       }
0) ! 9999.999 us |     }
0) ! 9999.999 us |   }
0) ! 9999.999 us | }
0)           |     sys_read() {
0)      0.796 us |   fget_light();
0)           |       vfs_read() {
0)           |         rw_verify_area() {
0)           |           security_file_permission() {
0)      0.488 us |         cap_file_permission();
0)      1.720 us |       }
0)      3.  4 us |     }
0)           |         tty_read() {
0)      0.488 us |       tty_paranoia_check();
0)           |           tty_ldisc_ref_wait() {
0)           |             tty_ldisc_try() {
0)      0.615 us |           _spin_lock_irqsave();
0)      0.615 us |           _spin_unlock_irqrestore();
0)      5.436 us |         }
0)      6.427 us |       }

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-28 09:45:04 +01:00
Frederic Weisbecker
83a8df618e tracing/function-graph-tracer: enhancements for the trace output
Impact: enhance the output of the graph-tracer

This patch applies some ideas of Ingo Molnar and Steven Rostedt.

* Output leaf functions in one line with parenthesis, semicolon and duration
  output.

* Add a second column (after cpu) for an overhead sign.
  if duration > 100 us, "!"
  if duration > 10 us, "+"
  else " "

* Print output in us with remaining nanosec: u.n

* Print duration on the right end, following the indentation of the functions.
  Use also visual clues: "-" on entry call (no duration to output) and "+" on
  return (duration output).

The name of the tracer has been fixed as well: function-branch becomes
function_branch.

Here is an example of the new output:

CPU[000]           dequeue_entity() {                    -
CPU[000]             update_curr() {                    -
CPU[000]               update_min_vruntime();                    + 0.512 us
CPU[000]             }                                + 1.504 us
CPU[000]             clear_buddies();                    + 0.481 us
CPU[000]             update_min_vruntime();                    + 0.504 us
CPU[000]           }                                + 4.557 us
CPU[000]           hrtick_update() {                    -
CPU[000]             hrtick_start_fair();                    + 0.489 us
CPU[000]           }                                + 1.443 us
CPU[000] +       }                                + 14.655 us
CPU[000] +     }                                + 15.678 us
CPU[000] +   }                                + 16.686 us
CPU[000]     msecs_to_jiffies();                    + 0.481 us
CPU[000]     put_prev_task_fair();                    + 0.504 us
CPU[000]     pick_next_task_fair();                    + 0.482 us
CPU[000]     pick_next_task_rt();                    + 0.504 us
CPU[000]     pick_next_task_fair();                    + 0.481 us
CPU[000]     pick_next_task_idle();                    + 0.489 us
CPU[000]     _spin_trylock();                    + 0.655 us
CPU[000]     _spin_unlock();                    + 0.609 us

CPU[000]  ------------8<---------- thread bash-2794 ------------8<----------

CPU[000]               finish_task_switch() {                    -
CPU[000]                 _spin_unlock_irq();                    + 0.722 us
CPU[000]               }                                + 2.369 us
CPU[000] !           }                                + 501972.605 us
CPU[000] !         }                                + 501973.763 us
CPU[000]           copy_from_read_buf() {                    -
CPU[000]             _spin_lock_irqsave();                    + 0.670 us
CPU[000]             _spin_unlock_irqrestore();                    + 0.699 us
CPU[000]             copy_to_user() {                    -
CPU[000]               might_fault() {                    -
CPU[000]                 __might_sleep();                    + 0.503 us
CPU[000]               }                                + 1.632 us
CPU[000]               __copy_to_user_ll();                    + 0.542 us
CPU[000]             }                                + 3.858 us
CPU[000]             tty_audit_add_data() {                    -
CPU[000]               _spin_lock_irq();                    + 0.609 us
CPU[000]               _spin_unlock_irq();                    + 0.624 us
CPU[000]             }                                + 3.196 us
CPU[000]             _spin_lock_irqsave();                    + 0.624 us
CPU[000]             _spin_unlock_irqrestore();                    + 0.625 us
CPU[000] +         }                                + 13.611 us
CPU[000]           copy_from_read_buf() {                    -
CPU[000]             _spin_lock_irqsave();                    + 0.624 us
CPU[000]             _spin_unlock_irqrestore();                    + 0.616 us
CPU[000]           }                                + 2.820 us
CPU[000]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 10:59:14 +01:00
Ingo Molnar
c7cc773076 Merge branches 'tracing/blktrace', 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/power-tracer' into tracing/core 2008-11-27 10:56:13 +01:00
Lai Jiangshan
4f5a7f40dd ftrace: prevent recursion
Impact: prevent unnecessary stack recursion

if the resched flag was set before we entered, then don't reschedule.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 10:11:53 +01:00
Arjan van de Ven
f3f47a6768 tracing: add "power-tracer": C/P state tracer to help power optimization
Impact: new "power-tracer" ftrace plugin

This patch adds a C/P-state ftrace plugin that will generate
detailed statistics about the C/P-states that are being used,
so that we can look at detailed decisions that the C/P-state
code is making, rather than the too high level "average"
that we have today.

An example way of using this is:

 mount -t debugfs none /sys/kernel/debug
 echo cstate > /sys/kernel/debug/tracing/current_tracer
 echo 1 > /sys/kernel/debug/tracing/tracing_enabled
 sleep 1
 echo 0 > /sys/kernel/debug/tracing/tracing_enabled
 cat /sys/kernel/debug/tracing/trace | perl scripts/trace/cstate.pl > out.svg

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 08:29:32 +01:00
Steven Rostedt
437f24fb89 ftrace: add cpu annotation for function graph tracer
Impact: enhancement for function graph tracer

When run on a SMP box, the function graph tracer is confusing because
it shows the different CPUS as changes in the trace.

This patch adds the annotation of 'CPU[###]' where ### is a three digit
number. The output will look similar to this:

CPU[001]     dput() {
CPU[000] } 726
CPU[001]     } 487
CPU[000] do_softirq() {
CPU[001]   } 2221
CPU[000]   __do_softirq() {
CPU[000]     __local_bh_disable() {
CPU[001]   unroll_tree_refs() {
CPU[000]     } 569
CPU[001]   } 501
CPU[000]     rcu_process_callbacks() {
CPU[001]   kfree() {

What makes this nice is that now you can grep the file and produce
readable format for a particular CPU.

 # cat /debug/tracing/trace > /tmp/trace
 # grep '^CPU\[000\]' /tmp/trace > /tmp/trace0
 # grep '^CPU\[001\]' /tmp/trace > /tmp/trace1

Will give you:

 # head /tmp/trace0
CPU[000] ------------8<---------- thread sshd-3899 ------------8<----------
CPU[000]     inotify_dentry_parent_queue_event() {
CPU[000]     } 2531
CPU[000]     inotify_inode_queue_event() {
CPU[000]     } 505
CPU[000]   } 69626
CPU[000] } 73089
CPU[000] audit_syscall_exit() {
CPU[000]   path_put() {
CPU[000]     dput() {

 # head /tmp/trace1
CPU[001] ------------8<---------- thread pcscd-3446 ------------8<----------
CPU[001]               } 4186
CPU[001]               dput() {
CPU[001]               } 543
CPU[001]               vfs_permission() {
CPU[001]                 inode_permission() {
CPU[001]                   shmem_permission() {
CPU[001]                     generic_permission() {
CPU[001]                     } 501
CPU[001]                   } 2205

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 06:52:57 +01:00
Steven Rostedt
660c7f9be9 ftrace: add thread comm to function graph tracer
Impact: enhancement to function graph tracer

Export the trace_find_cmdline so the function graph tracer can
use it to print the comms of the threads.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 06:52:56 +01:00
Steven Rostedt
e53a6319cc ftrace: let function tracing and function return run together
Impact: feature

This patch enables function tracing and function return to run together.
I've tested this by enabling the stack tracer and return tracer, where
both the function entry and function return are used together with
dynamic ftrace.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 06:52:55 +01:00
Steven Rostedt
5a45cfe1c6 ftrace: use code patching for ftrace graph tracer
Impact: more efficient code for ftrace graph tracer

This patch uses the dynamic patching, when available, to patch
the function graph code into the kernel.

This patch will ease the way for letting both function tracing
and function graph tracing run together.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 06:52:54 +01:00
Steven Rostedt
df4fc31558 ftrace: add function tracing to single thread
Impact: feature to function trace a single thread

This patch adds the ability to function trace a single thread.
The file:

  /debugfs/tracing/set_ftrace_pid

contains the pid to trace. Valid pids are any positive integer.
Writing any negative number to this file will disable the pid
tracing and the function tracer will go back to tracing all of
threads.

This feature works with both static and dynamic function tracing.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 06:52:52 +01:00
Frederic Weisbecker
287b6e68ca tracing/function-return-tracer: set a more human readable output
Impact: feature

This patch sets a C-like output for the function graph tracing.
For this aim, we now call two handler for each function: one on the entry
and one other on return. This way we can draw a well-ordered call stack.

The pid of the previous trace is loosely stored to be compared against
the one of the current trace to see if there were a context switch.

Without this little feature, the call tree would seem broken at
some locations.
We could use the sched_tracer to capture these sched_events but this
way of processing is much more simpler.

2 spaces have been chosen for indentation to fit the screen while deep
calls. The time of execution in nanosecs is printed just after closed
braces, it seems more easy this way to find the corresponding function.
If the time was printed as a first column, it would be not so easy to
find the corresponding function if it is called on a deep depth.

I plan to output the return value but on 32 bits CPU, the return value
can be 32 or 64, and its difficult to guess on which case we are.
I don't know what would be the better solution on X86-32: only print
eax (low-part) or even edx (high-part).

Actually it's thee same problem when a function return a 8 bits value, the
high part of eax could contain junk values...

Here is an example of trace:

sys_read() {
  fget_light() {
  } 526
  vfs_read() {
    rw_verify_area() {
      security_file_permission() {
        cap_file_permission() {
        } 519
      } 1564
    } 2640
    do_sync_read() {
      pipe_read() {
        __might_sleep() {
        } 511
        pipe_wait() {
          prepare_to_wait() {
          } 760
          deactivate_task() {
            dequeue_task() {
              dequeue_task_fair() {
                dequeue_entity() {
                  update_curr() {
                    update_min_vruntime() {
                    } 504
                  } 1587
                  clear_buddies() {
                  } 512
                  add_cfs_task_weight() {
                  } 519
                  update_min_vruntime() {
                  } 511
                } 5602
                dequeue_entity() {
                  update_curr() {
                    update_min_vruntime() {
                    } 496
                  } 1631
                  clear_buddies() {
                  } 496
                  update_min_vruntime() {
                  } 527
                } 4580
                hrtick_update() {
                  hrtick_start_fair() {
                  } 488
                } 1489
              } 13700
            } 14949
          } 16016
          msecs_to_jiffies() {
          } 496
          put_prev_task_fair() {
          } 504
          pick_next_task_fair() {
          } 489
          pick_next_task_rt() {
          } 496
          pick_next_task_fair() {
          } 489
          pick_next_task_idle() {
          } 489

------------8<---------- thread 4 ------------8<----------

finish_task_switch() {
} 1203
do_softirq() {
  __do_softirq() {
    __local_bh_disable() {
    } 669
    rcu_process_callbacks() {
      __rcu_process_callbacks() {
        cpu_quiet() {
          rcu_start_batch() {
          } 503
        } 1647
      } 3128
      __rcu_process_callbacks() {
      } 542
    } 5362
    _local_bh_enable() {
    } 587
  } 8880
} 9986
kthread_should_stop() {
} 669
deactivate_task() {
  dequeue_task() {
    dequeue_task_fair() {
      dequeue_entity() {
        update_curr() {
          calc_delta_mine() {
          } 511
          update_min_vruntime() {
          } 511
        } 2813

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 01:59:45 +01:00
Frederic Weisbecker
fb52607afc tracing/function-return-tracer: change the name into function-graph-tracer
Impact: cleanup

This patch changes the name of the "return function tracer" into
function-graph-tracer which is a more suitable name for a tracing
which makes one able to retrieve the ordered call stack during
the code flow.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 01:59:45 +01:00
Markus Metzger
1e9b51c283 x86, bts, ftrace: a BTS ftrace plug-in prototype
Impact: add new ftrace plugin

A prototype for a BTS ftrace plug-in.

The tracer collects branch trace in a cyclic buffer for each cpu.

The tracer is not configurable and the trace for each snapshot is
appended when doing cat /debug/tracing/trace.

This is a proof of concept that will be extended with future patches
to become a (hopefully) useful tool.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 17:31:13 +01:00
Markus Metzger
8bba1bf5e2 x86, ftrace: call trace->open() before stopping tracing; add trace->print_header()
Add a callback to allow an ftrace plug-in to write its own header.

Move the call to trace->open() up a few lines.

The changes are required by the BTS ftrace plug-in.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 17:31:13 +01:00
Peter Zijlstra
ca109491f6 hrtimer: removing all ur callback modes
Impact: cleanup, move all hrtimer processing into hardirq context

This is an attempt at removing some of the hrtimer complexity by
reducing the number of callback modes to 1.

This means that all hrtimer callback functions will be ran from HARD-irq
context.

I went through all the 30 odd hrtimer callback functions in the kernel
and saw only one that I'm not quite sure of, which is the one in
net/can/bcm.c - hence I'm CC-ing the folks responsible for that code.

Furthermore, the hrtimer core now calls callbacks directly with IRQs
disabled in case you try to enqueue an expired timer. If this timer is a
periodic timer (which should use hrtimer_forward() to advance its time)
then it might be possible to end up in an inf. recursive loop due to the
fact that hrtimer_forward() doesn't round up to the next timer
granularity, and therefore keeps on calling the callback - obviously
this needs a fix.

Aside from that, this seems to compile and actually boot on my dual core
test box - although I'm sure there are some bugs in, me not hitting any
makes me certain :-)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 15:45:46 +01:00
Ingo Molnar
6f893fb2e8 Merge branches 'tracing/branch-tracer', 'tracing/fastboot', 'tracing/ftrace', 'tracing/function-return-tracer', 'tracing/power-tracer', 'tracing/powerpc', 'tracing/ring-buffer', 'tracing/stack-tracer' and 'tracing/urgent' into tracing/core 2008-11-24 17:46:24 +01:00
Pekka Paalanen
7ee1768ddb x86, mmiotrace: fix buffer overrun detection
Impact: fix mmiotrace overrun tracing

When ftrace framework moved to use the ring buffer facility, the buffer
overrun detection was broken after 2.6.27 by commit

| commit 3928a8a2d9
| Author: Steven Rostedt <rostedt@goodmis.org>
| Date:   Mon Sep 29 23:02:41 2008 -0400
|
|     ftrace: make work with new ring buffer
|
|     This patch ports ftrace over to the new ring buffer.

The detection is now fixed by using the ring buffer API.

When mmiotrace detects a buffer overrun, it will report the number of
lost events. People reading an mmiotrace log must know if something was
missed, otherwise the data may not make sense.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 20:33:23 +01:00
Frederic Weisbecker
eae849ca03 tracing/function-return-tracer: don't trace kfree while it frees the return stack
Impact: fix a crash

While I killed the cat process, I got sometimes the following (but rare)
crash:

[   65.689027] Pid: 2969, comm: cat Not tainted (2.6.28-rc6-tip #83) AMILO Li 2727
[   65.689027] EIP: 0060:[<00000000>] EFLAGS: 00010082 CPU: 1
[   65.689027] EIP is at 0x0
[   65.689027] EAX: 00000000 EBX: f66cd780 ECX: c019a64a EDX: f66cd780
[   65.689027] ESI: 00000286 EDI: f66cd780 EBP: f630be2c ESP: f630be24
[   65.689027]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[   65.689027] Process cat (pid: 2969, ti=f630a000 task=f66cd780 task.ti=f630a000)
[   65.689027] Stack:
[   65.689027]  00000012 f630bd54 f630be7c c012c853 00000000 c0133cc9 f66cda54 f630be5c
[   65.689027]  f630be68 f66cda54 f66cd88c f66cd878 f7070000 00000001 f630be90 c0135dbc
[   65.689027]  f614a614 f630be68 f630be68 f65ba200 00000002 f630bf10 f630be90 c012cad6
[   65.689027] Call Trace:
[   65.689027]  [<c012c853>] ? do_exit+0x603/0x850
[   65.689027]  [<c0133cc9>] ? next_signal+0x9/0x40
[   65.689027]  [<c0135dbc>] ? dequeue_signal+0x8c/0x180
[   65.689027]  [<c012cad6>] ? do_group_exit+0x36/0x90
[   65.689027]  [<c013709c>] ? get_signal_to_deliver+0x20c/0x390
[   65.689027]  [<c0102b69>] ? do_notify_resume+0x99/0x8b0
[   65.689027]  [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80
[   65.689027]  [<c014db9b>] ? trace_hardirqs_on+0xb/0x10
[   65.689027]  [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80
[   65.689027]  [<c02e39b0>] ? n_tty_write+0x0/0x340
[   65.689027]  [<c02e1812>] ? redirected_tty_write+0x82/0x90
[   65.689027]  [<c019ee99>] ? vfs_write+0x99/0xd0
[   65.689027]  [<c02e1790>] ? redirected_tty_write+0x0/0x90
[   65.689027]  [<c019f342>] ? sys_write+0x42/0x70
[   65.689027]  [<c01035ca>] ? work_notifysig+0x13/0x19
[   65.689027] Code:  Bad EIP value.
[   65.689027] EIP: [<00000000>] 0x0 SS:ESP 0068:f630be24

This is because on do_exit(), kfree is called to free the return addresses stack
but kfree is traced and stored its return address in this stack.
This patch fixes it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 17:39:03 +01:00
Török Edwin
e38da59269 tracing/stack-tracer: avoid races accessing file
Impact: fix race

vma->vm_file reference is only stable while holding the mmap_sem,
so move usage of it to within the critical section.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 12:20:26 +01:00
Török Edwin
8d26487fd4 tracing/stack-tracer: introduce CONFIG_USER_STACKTRACE_SUPPORT
Impact: cleanup

User stack tracing is just implemented for x86, but it is not x86 specific.

Introduce a generic config flag, that is currently enabled only for x86.
When other arches implement it, they will have to
SELECT USER_STACKTRACE_SUPPORT.

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:53:50 +01:00
Török Edwin
cffa10aecb tracing/stack-tracer: fix locking and refcounts
Impact: fix refcounting/object-access bug

Hold mmap_sem while looking up/accessing vma.
Hold the RCU lock while using the task we looked up.

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:53:49 +01:00
Török Edwin
8d7c6a9616 tracing/stack-tracer: fix style issues
Impact: cleanup

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:53:48 +01:00
Steven Rostedt
0429149fb5 trace: fix compiler warning in branch profiler
Impact: fix compiler warning

The ftrace_pointers used in the branch profiler are constant values.
They should never change. But the compiler complains when they are
passed into the debugfs_create_file as a data pointer, because the
function discards the qualifier.

This patch typecasts the parameter to debugfs_create_file back to
a void pointer. To remind the callbacks that they are pointing to
a constant value, I also modified the callback local pointers to
be const struct ftrace_pointer * as well.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:46:49 +01:00
Steven Rostedt
69bb54ec05 ftrace: add ftrace_off_permanent
Impact: add new API to disable all of ftrace on anomalies

It case of a serious anomaly being detected (like something caught by
lockdep) it is a good idea to disable all tracing immediately, without
grabing any locks.

This patch adds ftrace_off_permanent that disables the tracers, function
tracing and ring buffers without a way to enable them again. This should
only be used when something serious has been detected.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:45:34 +01:00
Steven Rostedt
033601a32b ring-buffer: add tracing_off_permanent
Impact: feature to permanently disable ring buffer

This patch adds a API to the ring buffer code that will permanently
disable the ring buffer from ever recording. This should only be
called when some serious anomaly is detected, and the system
may be in an unstable state. When that happens, shutting down the
recording to the ring buffers may be appropriate.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:44:37 +01:00
Steven Rostedt
2bcd521a68 trace: profile all if conditionals
Impact: feature to profile if statements

This patch adds a branch profiler for all if () statements.
The results will be found in:

  /debugfs/tracing/profile_branch

For example:

   miss      hit    %        Function                  File              Line
 ------- ---------  -        --------                  ----              ----
       0        1 100 x86_64_start_reservations      head64.c             127
       0        1 100 copy_bootdata                  head64.c             69
       1        0   0 x86_64_start_kernel            head64.c             111
      32        0   0 set_intr_gate                  desc.h               319
       1        0   0 reserve_ebda_region            head.c               51
       1        0   0 reserve_ebda_region            head.c               47
       0        1 100 reserve_ebda_region            head.c               42
       0        0   X maxcpus                        main.c               165

Miss means the branch was not taken. Hit means the branch was taken.
The percent is the percentage the branch was taken.

This adds a significant amount of overhead and should only be used
by those analyzing their system.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:41:01 +01:00
Steven Rostedt
bac28bfe42 trace: branch profiling should not print percent without data
Impact: cleanup on output of branch profiler

When a branch has not been taken, it does not make sense to show
a percentage incorrect or hit. This patch changes the behaviour
to print out a 'X' when the branch has not been executed yet.

For example:

 correct incorrect  %        Function                  File              Line
 ------- ---------  -        --------                  ----              ----
    2096        0   0 do_arch_prctl                  process_64.c         832
       0        0   X do_arch_prctl                  process_64.c         804
    2604        0   0 IS_ERR                         err.h                34
  130228     5765   4 __switch_to                    process_64.c         673
       0        0   X enable_TSC                     process_64.c         448
       0        0   X disable_TSC                    process_64.c         431

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:40:21 +01:00
Steven Rostedt
45b797492a trace: consolidate unlikely and likely profiler
Impact: clean up to make one profiler of like and unlikely tracer

The likely and unlikely profiler prints out the file and line numbers
of the annotated branches that it is profiling. It shows the number
of times it was correct or incorrect in its guess. Having two
different files or sections for that matter to tell us if it was a
likely or unlikely is pretty pointless. We really only care if
it was correct or not.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:39:56 +01:00
Török Edwin
b54d3de9f3 tracing: identify which executable object the userspace address belongs to
Impact: modify+improve the userstacktrace tracing visualization feature

Store thread group leader id, and use it to lookup the address in the
process's map. We could have looked up the address on thread's map,
but the thread might not exist by the time we are called. The process
might not exist either, but if you are reading trace_pipe, that is
unlikely.

Example usage:

 mount -t debugfs nodev /sys/kernel/debug
 cd /sys/kernel/debug/tracing
 echo userstacktrace >iter_ctrl
 echo sym-userobj >iter_ctrl
 echo sched_switch >current_tracer
 echo 1 >tracing_enabled
 cat trace_pipe >/tmp/trace&
 .... run application ...
 echo 0 >tracing_enabled
 cat /tmp/trace

You'll see stack entries like:

   /lib/libpthread-2.7.so[+0xd370]

You can convert them to function/line using:

   addr2line -fie /lib/libpthread-2.7.so 0xd370

Or:

   addr2line -fie /usr/lib/debug/libpthread-2.7.so 0xd370

For non-PIC/PIE executables this won't work:

   a.out[+0x73b]

You need to run the following: addr2line -fie a.out 0x40073b
(where 0x400000 is the default load address of a.out)

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 09:45:42 +01:00
Török Edwin
02b67518e2 tracing: add support for userspace stacktraces in tracing/iter_ctrl
Impact: add new (default-off) tracing visualization feature

Usage example:

 mount -t debugfs nodev /sys/kernel/debug
 cd /sys/kernel/debug/tracing
 echo userstacktrace >iter_ctrl
 echo sched_switch >current_tracer
 echo 1 >tracing_enabled
 .... run application ...
 echo 0 >tracing_enabled

Then read one of 'trace','latency_trace','trace_pipe'.

To get the best output you can compile your userspace programs with
frame pointers (at least glibc + the app you are tracing).

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 09:25:15 +01:00
Frederic Weisbecker
f201ae2356 tracing/function-return-tracer: store return stack into task_struct and allocate it dynamically
Impact: use deeper function tracing depth safely

Some tests showed that function return tracing needed a more deeper depth
of function calls. But it could be unsafe to store these return addresses
to the stack.

So these arrays will now be allocated dynamically into task_struct of current
only when the tracer is activated.

Typical scheme when tracer is activated:
- allocate a return stack for each task in global list.
- fork: allocate the return stack for the newly created task
- exit: free return stack of current
- idle init: same as fork

I chose a default depth of 50. I don't have overruns anymore.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 09:17:26 +01:00
Ingo Molnar
a0a70c735e Merge branches 'tracing/profiling', 'tracing/options' and 'tracing/urgent' into tracing/core 2008-11-23 09:10:32 +01:00
Liming Wang
522a110b42 function tracing: fix wrong position computing of stack_trace
Impact: make output of stack_trace complete if buffer overruns

When read buffer overruns, the output of stack_trace isn't complete.

When printing records with seq_printf in t_show, if the read buffer
has overruned by the current record, then this record won't be
printed to user space through read buffer, it will just be dropped in
this printing.

When next printing, t_start should return the "*pos"th record, which
is the one dropped by previous printing, but it just returns
(m->private + *pos)th record.

Here we use a more sane method to implement seq_operations which can
be found in kernel code. Thus we needn't initialize m->private.

About testing, it's not easy to overrun read buffer, but we can use
seq_printf to print more padding bytes in t_show, then it's easy to
check whether or not records are lost.

This commit has been tested on both condition of overrun and non
overrun.

Signed-off-by: Liming Wang <liming.wang@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-21 08:49:52 +01:00
Ingo Molnar
9676e73a9e Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core
Conflicts:
	kernel/trace/ftrace.c

[ We conflicted here because we backported a few fixes to
  tracing/urgent - which has different internal APIs. ]
2008-11-19 10:04:25 +01:00
Ingo Molnar
86fa2f6067 ftrace: fix selftest locking
Impact: fix self-test boot crash

Self-test failure forgot to re-lock the BKL - crashing the next
initcall:

Testing tracer irqsoff: .. no entries found ..FAILED!
initcall init_irqsoff_tracer+0x0/0x11 returned 0 after 3906 usecs
calling  init_mmio_trace+0x0/0xf @ 1
------------[ cut here ]------------
Kernel BUG at c0c0a915 [verbose debug info unavailable]
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file:

Pid: 1, comm: swapper Not tainted (2.6.28-rc5-tip #53704)
EIP: 0060:[<c0c0a915>] EFLAGS: 00010286 CPU: 1
EIP is at unlock_kernel+0x10/0x2b
EAX: ffffffff EBX: 00000000 ECX: 00000000 EDX: f7030000
ESI: c12da19c EDI: 00000000 EBP: f7039f54 ESP: f7039f54
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 1, ti=f7038000 task=f7030000 task.ti=f7038000)
Stack:
 f7039f6c c0164d30 c013fed8 a7d8d7b4 00000000 00000000 f7039f74 c12fb78a
 f7039fd0 c0101132 c12fb77d 00000000 6f727200 6f632072 2d206564 c1002031
 0000000f f7039fa2 f7039fb0 3531b171 00000000 00000000 0000002f c12ca480
Call Trace:
 [<c0164d30>] ? register_tracer+0x66/0x13f
 [<c013fed8>] ? ktime_get+0x19/0x1b
 [<c12fb78a>] ? init_mmio_trace+0xd/0xf
 [<c0101132>] ? do_one_initcall+0x4a/0x111
 [<c12fb77d>] ? init_mmio_trace+0x0/0xf
 [<c015c7e6>] ? init_irq_proc+0x46/0x59
 [<c12e851d>] ? kernel_init+0x104/0x152
 [<c12e8419>] ? kernel_init+0x0/0x152
 [<c01038b7>] ? kernel_thread_helper+0x7/0x10
Code: 58 14 43 75 0a b8 00 9b 2d c1 e8 51 43 7a ff 64 a1 00 a0 37 c1 89 58 14 5b 5d c3 55 64 8b 15 00 a0 37 c1 83 7a 14 00 89 e5 79 04 <0f> 0b eb fe 8b 42 14 48 85 c0 89 42 14 79 0a b8 00 9b 2d c1 e8
EIP: [<c0c0a915>] unlock_kernel+0x10/0x2b SS:ESP 0068:f7039f54
---[ end trace a7919e7f17c0a725 ]---
Kernel panic - not syncing: Attempted to kill init!

So clean up the flow a bit.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-19 10:00:15 +01:00
Ingo Molnar
6d5b43a67a Merge branch 'tip/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent 2008-11-19 09:00:50 +01:00
Steven Rostedt
32464779a1 ftrace: fix dyn ftrace filter selection
Impact: clean up and fix for dyn ftrace filter selection

The previous logic of the dynamic ftrace selection of enabling
or disabling functions was complex and incorrect. This patch simplifies
the code and corrects the usage. This simplification also makes the
code more robust.

Here is the correct logic:

  Given a function that can be traced by dynamic ftrace:

  If the function is not to be traced, disable it if it was enabled.
  (this is if the function is in the set_ftrace_notrace file)

  (filter is on if there exists any functions in set_ftrace_filter file)

  If the filter is on, and we are enabling functions:
    If the function is in set_ftrace_filter, enable it if it is not
      already enabled.
    If the function is not in set_ftrace_filter, disable it if it is not
      already disabled.

  Otherwise, if the filter is off and we are enabling function tracing:
    Enable the function if it is not already enabled.

  Otherwise, if we are disabling function tracing:
    Disable the function if it is not already disabled.

This code now sets or clears the ENABLED flag in the record, and at the
end it will enable the function if the flag is set, or disable the function
if the flag is cleared.

The parameters for the function that does the above logic is also
simplified. Instead of passing in confusing "new" and "old" where
they might be swapped if the "enabled" flag is not set. The old logic
even had one of the above always NULL and had to be filled in. The new
logic simply passes in one parameter called "nop". A "call" is calculated
in the code, and at the end of the logic, when we know we need to either
disable or enable the function, we can then use the "nop" and "call"
properly.

This code is more robust than the previous version.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-19 00:19:47 -05:00
Steven Rostedt
8204327831 ftrace: make filtered functions effective on setting
Impact: fix filter selection to apply when set

It can be confusing when the set_filter_functions is set (or cleared)
and the functions being recorded by the dynamic tracer does not
match.

This patch causes the code to be updated if the function tracer is
enabled and the filter is changed.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-19 00:18:54 -05:00
Steven Rostedt
f10ed36ec1 ftrace: fix set_ftrace_filter
Impact: fix of output of set_ftrace_filter

The commit "ftrace: do not show freed records in
             available_filter_functions"

Removed a bit too much from the set_ftrace_filter code, where we now see
all functions in the set_ftrace_filter file even when we set a filter.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-19 00:17:45 -05:00
Heiko Carstens
a22506347d ftrace: preemptoff selftest not working
Impact: fix preemptoff and preemptirqsoff tracer self-tests

I was wondering why the preemptoff and preemptirqsoff tracer selftests
don't work on s390. After all its just that they get called from
non-preemptible context:

kernel_init() will execute all initcalls, however the first line in
kernel_init() is lock_kernel(), which causes the preempt_count to be
increased. Any later calls to add_preempt_count() (especially those
from the selftests) will therefore not result in a call to
trace_preempt_off() since the check below in add_preempt_count()
will be false:

        if (preempt_count() == val)
                trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));

Hence the trace buffer will be empty.

Fix this by releasing the BKL during the self-tests.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 21:54:50 +01:00
Vegard Nossum
641d2f63cf trace: introduce missing mutex_unlock()
Impact: fix tracing buffer mutex leak in case of allocation failure

This error was spotted by this semantic patch:

  http://www.emn.fr/x-info/coccinelle/mut.html

It looks correct as far as I can tell. Please review.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 21:37:15 +01:00
Julia Lawall
0bb943c7a2 tracing: kernel/trace/trace.c: introduce missing kfree()
Impact: fix memory leak

Error handling code following a kzalloc should free the allocated data.

The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,l;
position p1,p2;
expression *ptr != NULL;
@@

(
if ((x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...)) == NULL) S
|
x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
)
<... when != x
     when != if (...) { <+...x...+> }
x->f = E
...>
(
 return \(0\|<+...x...+>\|ptr\);
|
 return@p2 ...;
)

@script:python@
p1 << r.p1;
p2 << r.p2;
@@

print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 16:59:58 +01:00
Frederic Weisbecker
0231022cc3 tracing/function-return-tracer: add the overrun field
Impact: help to find the better depth of trace

We decided to arbitrary define the depth of function return trace as
"20". Perhaps this is not enough. To help finding an optimal depth, we
measure now the overrun: the number of functions that have been missed
for the current thread. By default this is not displayed, we have to
do set a particular flag on the return tracer: echo overrun >
/debug/tracing/trace_options And the overrun will be printed on the
right.

As the trace shows below, the current 20 depth is not enough.

update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838)
update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838)
do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838)
tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838)
tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838)
vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274)
vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274)
set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274)
con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274)
release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274)
release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274)
con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274)
con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274)
n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274)
irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274)
ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274)
ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274)
ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274)
hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 11:11:00 +01:00
Frederic Weisbecker
0619faf657 tracing/ftrace: make nop tracer using tracer flags
Impact: give an example on how to use specific tracer flags

This patch propose to use the nop tracer to provide an
example for using the tracer's custom flags implementation.

V2: replace structures and defines just after the headers includes for
    cleanliness.
V3: replace defines by enum values.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Noonan <steven@uplinklabs.net>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 11:10:59 +01:00
Frederic Weisbecker
adf9f19574 tracing/ftrace: implement a set_flag callback for tracers
Impact: give a way to send specific messages to tracers

The current implementation of tracing uses some flags to control the
output of general tracers. But we have no way to implement custom
flags handling for a specific tracer. This patch proposes a new
callback for the struct tracer which called set_flag and a structure
that represents a 32 bits variable flag.

A tracer can implement a struct tracer_flags on which it puts the
initial value of the flag integer. Than it can place a range of flags
with their name and their flag mask on the flag integer. The structure
that implement a single flag is called struct tracer_opt.

These custom flags will be available through the trace_options file
like the general tracing flags. Changing their value is done like the
other general flags. For example if you have a flag that calls "foo",
you can activate it by writing "foo" or "nofoo" on trace_options.

Note that the set_flag callback is optional and is only needed if you
want the flags changing to be signaled to your tracer and let it to
accept or refuse their assignment.

V2: Some arrangements in coding style....

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 11:10:58 +01:00
Ingo Molnar
5a209c2d58 Merge branches 'tracing/branch-tracer' and 'tracing/urgent' into tracing/core 2008-11-18 08:52:13 +01:00
Aneesh Kumar K.V
0c726da983 tracing: branch tracer, fix writing to trace/trace_options
Impact: fix trace_options behavior

writing to trace/trace_options use the index of the array
to find the value of the flag. With branch tracer flag
defined conditionally, this breaks writing to trace_options
with branch tracer disabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-17 12:07:27 +01:00
Ingo Molnar
3f8e402f34 Merge branches 'tracing/branch-tracer', 'tracing/ftrace', 'tracing/function-return-tracer', 'tracing/tracepoints' and 'tracing/urgent' into tracing/core 2008-11-17 09:36:22 +01:00
walimis
5821e1b74f function tracing: fix wrong pos computing when read buffer has been fulfilled
Impact: make output of available_filter_functions complete

phenomenon:

The first value of dyn_ftrace_total_info is not equal with
`cat available_filter_functions | wc -l`, but they should be equal.

root cause:

When printing functions with seq_printf in t_show, if the read buffer
is just overflowed by current function record, then this function
won't be printed to user space through read buffer, it will
just be dropped. So we can't see this function printing.

So, every time the last function to fill the read buffer, if overflowed,
will be dropped.

This also applies to set_ftrace_filter if set_ftrace_filter has
more bytes than read buffer.

fix:

Through checking return value of seq_printf, if less than 0, we know
this function doesn't be printed. Then we decrease position to force
this function to be printed next time, in next read buffer.

Another little fix is to show correct allocating pages count.

Signed-off-by: walimis <walimisdev@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 08:32:05 +01:00
Frederic Weisbecker
e7d3737ea1 tracing/function-return-tracer: support for dynamic ftrace on function return tracer
This patch adds the support for dynamic tracing on the function return tracer.
The whole difference with normal dynamic function tracing is that we don't need
to hook on a particular callback. The only pro that we want is to nop or set
dynamically the calls to ftrace_caller (which is ftrace_return_caller here).

Some security checks ensure that we are not trying to launch dynamic tracing for
return tracing while normal function tracing is already running.

An example of trace with getnstimeofday set as a filter:

ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:57:38 +01:00
Frederic Weisbecker
072b40a156 tracing/branch-tracer: fix a trace recursion on branch tracer
Impact: fix crash when enabling the branch-tracer

When the branch tracer inserts an event through
probe_likely_condition(), it calls local_irq_save() and then results
in a trace recursion.

local_irq_save() -> trace_hardirqs_off() -> trace_hardirqs_off_caller()
	-> unlikely()

The trace_branch.c file is protected by DISABLE_BRANCH_PROFILING but
that doesn't prevent from external call to functions that use
unlikely().

My box crashed each time I tried to set this tracer (sudden and hard
reboot).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:55:59 +01:00
Frederic Weisbecker
1c80025a49 tracing/ftrace: change the type of the init() callback
Impact: extend the ->init() method with the ability to fail

This bring a way to know if the initialization of a tracer successed.
A tracer must return 0 on success and a traditional error (ie:
-ENOMEM) if it fails.

If a tracer fails to init, it is free to print a detailed warn. The
tracing api will not and switch to a new tracer will just return the
error from the init callback.

Note: this will be used for the return tracer.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:55:23 +01:00
Frederic Weisbecker
e6e7a65aab tracing/ftrace: fix unexpected -EINVAL when longest tracer name is set
Impact: fix confusing write() -EINVAL when changing the tracer

The following commit d9e540762f remade
alive the bug which made the set of a new tracer returning -EINVAL if
this is the longest name of tracer. This patch corrects it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:53:39 +01:00
Steven Rostedt
ee02a2e5c8 ftrace: make filtered functions effective on setting
Impact: set filtered functions at time the filter is set

It can be confusing when the set_filter_functions is set (or cleared)
and the functions being recorded by the dynamic tracer does not
match.

This patch causes the code to be updated if the function tracer is
enabled and the filter is changed.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:37:46 +01:00
Steven Rostedt
982c350b9e ftrace: fix dyn ftrace filter
Impact: correct implementation of dyn ftrace filter

The old decisions made by the filter algorithm was complex and incorrect.
This lead to inconsistent enabling or disabling of functions when
the filter was used.

This patch simplifies that code and in doing so, corrects the usage
of the filters.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:37:17 +01:00
Steven Rostedt
20e5227e9f ftrace: allow NULL pointers in mcount_loc
Impact: make ftrace_convert_nops() more permissive

Due to the way different architecture linkers combine the data sections
of the mcount_loc (the section that lists all the locations that
call mcount), there may be zeros added in that section. This is usually
due to strange alignments that the linker performs, that pads in zeros.

This patch makes the conversion code to nops skip any pointer in
the mcount_loc section that is NULL.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:36:42 +01:00
Steven Rostedt
31e889098a ftrace: pass module struct to arch dynamic ftrace functions
Impact: allow archs more flexibility on dynamic ftrace implementations

Dynamic ftrace has largly been developed on x86. Since x86 does not
have the same limitations as other architectures, the ftrace interaction
between the generic code and the architecture specific code was not
flexible enough to handle some of the issues that other architectures
have.

Most notably, module trampolines. Due to the limited branch distance
that archs make in calling kernel core code from modules, the module
load code must create a trampoline to jump to what will make the
larger jump into core kernel code.

The problem arises when this happens to a call to mcount. Ftrace checks
all code before modifying it and makes sure the current code is what
it expects. Right now, there is not enough information to handle modifying
module trampolines.

This patch changes the API between generic dynamic ftrace code and
the arch dependent code. There is now two functions for modifying code:

  ftrace_make_nop(mod, rec, addr) - convert the code at rec->ip into
       a nop, where the original text is calling addr. (mod is the
       module struct if called by module init)

  ftrace_make_caller(rec, addr) - convert the code rec->ip that should
       be a nop into a caller to addr.

The record "rec" now has a new field called "arch" where the architecture
can add any special attributes to each call site record.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:36:02 +01:00
Steven Rostedt
d51ad7ac48 ftrace: replace raw_local_irq_save with local_irq_save
Impact: fix lockdep disabling itself when function tracing is enabled

The raw_local_irq_saves used in ftrace is causing problems with
lockdep. (it thinks the irq flags are out of sync and disables
itself with a warning)

The raw ops here are not needed, and the normal local_irq_save is fine.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:35:37 +01:00
Steven Rostedt
918c115410 ftrace: do not process freed records
Impact: keep from converting freed records

When the tracer is started or stopped, it converts all code pointed
to by the saved records into callers to ftrace or nops. When modules
are unloaded, their records are freed, but they still exist within
the record pages.

This patch changes the code to skip over freed records.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:30:17 +01:00
Steven Rostedt
b17e8a37a1 ftrace: disable ftrace on anomalies in trace start and stop
Impact: robust feature to disable ftrace on start or stop tracing on error

Currently only the initial conversion to nops will disable ftrace
on an anomaly. But if an anomaly happens on start or stopping of the
tracer, it will silently fail.

This patch adds a check there too, to disable ftrace and warn if the
conversion fails.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:30:07 +01:00
Steven Rostedt
f3c7ac40a9 ftrace: remove condition from ftrace_record_ip
Impact: let module functions be recorded when dyn ftrace not enabled

When dynamic ftrace had a daemon and a hash to record the locations
of mcount callers at run time, the recording needed to stop when
ftrace was disabled. But now that the recording is done at compile time
and the ftrace_record_ip is only called at boot up and when a module
is loaded, we no longer need to check if ftrace_enabled is set.
In fact, this breaks module load if it is not set because we skip
over module functions.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:29:53 +01:00
Ingo Molnar
c91add5fa6 Merge branches 'tracing/fastboot', 'tracing/ftrace' and 'tracing/urgent' into tracing/core 2008-11-16 07:28:46 +01:00
James Morris
2b82892565 Merge branch 'master' into next
Conflicts:
	security/keys/internal.h
	security/keys/process_keys.c
	security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 11:29:12 +11:00
David Howells
b6dff3ec5e CRED: Separate task security context from task_struct
Separate the task security context from task_struct.  At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.

Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.

With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:16 +11:00
walimis
b3535c6390 ftrace: remove unnecessary if condition of __unregister_ftrace_function
Because it has goto out before ftrace_list == &ftrace_list_end,
that's to say, we never meet this condition.

Signed-off-by: walimis <walimisdev@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-13 19:36:30 +01:00
Ingo Molnar
ee51a1de7e tracing: fix mmiotrace resizing crash
Pekka reported a crash when resizing the mmiotrace tracer (if only
mmiotrace is enabled).

This happens because in that case we do not allocate the max buffer,
but we try to use it.

Make ring_buffer_resize() idempotent against NULL buffers.

Reported-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-13 14:58:31 +01:00
Steven Rostedt
12ef7d4486 ftrace: CPU buffer start annotation clean ups
Impact: better handling of CPU buffer start annotation

Because of the confusion with the per CPU buffers wrapping where
one CPU might be more active at the end of the trace than the other
CPUs causing that one CPU to have a shorter history. Kernel
developers were confused by the "missing" data of that one CPU
at the beginning of the trace output. An annotation was added to
the trace output to show that the buffer had started:

 # tracer: function
 #
 #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
 #              | |       |          |         |
 ##### CPU 3 buffer started ####
          <idle>-0     [003]   158.192959: smp_apic_timer_interrupt
 [...]
           <idle>-0     [003]   161.556520: default_idle
 ##### CPU 1 buffer started ####
           <idle>-0     [001]   161.592494: hrtimer_force_reprogram
 [etc]

But this annotation gets a bit messy when tracers do not fill the
buffers. This patch does a couple of things:

 One) it adds a flag to trace_options to disable these annotations

 Two) it does not annotate if the tracer did not overflow its buffer.

This makes the output much cleaner.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-13 09:49:24 +01:00
Steven Rostedt
ee6bce5227 ftrace: rename iter_ctrl to trace_options
Impact: rename file /debug/tracing/iter_ctrl to /debug/tracing/trace_options

The original ftrace had a file called "iter_ctrl" that would control
the way the output was iterated. But this file grew into a catch all
for different trace options. This patch renames the file from iter_ctrl
to trace_options to reflect this change.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-13 09:49:23 +01:00
Steven Rostedt
1696b2b0f4 ftrace: show buffer size in kilobytes
Impact: change the units of buffer_size_kb to kilobytes

This patch changes the units of the buffer_size_kb file to kilobytes.
Reading and writing to the file uses kilobytes as units. To help
users to know what units are used, the output of the file now
looks like:

  # cat /debug/tracing/buffer_size_kb
  1408

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-13 09:49:22 +01:00
Steven Rostedt
a94c80e78b ftrace: rename trace_entries to buffer_size_kb
Impact: rename of debugfs file trace_entries to buffer_size_kb

The original ftrace had fixed size entries, and the number of entries
was shown and modified via the file called trace_entries. By converting
to the unified trace buffer, we now allow for variable size entries
which makes the meaning of trace_entries pointless.

Since trace_size might be confused to the size of the trace, this patch
names it "buffer_size_kb" (thanks to Arjan van de Ven for this idea).

[ mingo@elte.hu: changed from buffer_size to buffer_size_kb ]

( Note, the units are still bytes - the next patch changes that,
  to keep the wide rename patch separate from the unit-change patch. )

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-13 09:49:21 +01:00
Ingo Molnar
24de38620d Merge branches 'tracing/branch-tracer', 'tracing/fastboot', 'tracing/function-return-tracer' and 'tracing/urgent' into tracing/core 2008-11-13 09:48:03 +01:00
Steven Rostedt
94b80ffd65 ftrace: rename trace_unlikely.c file
Impact: File name change of trace_unlikely.c

The "unlikely" name for the tracer is quite ugly. We renamed all the
parts of it to "branch" and now it is time to rename the file too.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 22:28:40 +01:00
Steven Rostedt
80e5ea4506 ftrace: add tracer called branch
Impact: added new branch tracer

Currently the tracing of branch profiling (unlikelys and likelys hit)
is only activated by the iter_ctrl. This patch adds a tracer called
"branch" that will just trace the branch profiling. The advantage
of adding this tracer is that it can be added to the ftrace selftests
on startup.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 22:28:25 +01:00
Steven Rostedt
9f029e83e9 ftrace: rename unlikely iter_ctrl to branch
Impact: rename of iter_ctrl unlikely to branch

The unlikely name is ugly. This patch converts the iter_ctrl command
"unlikely" and "nounlikely" to "branch" and "nobranch" respectively.

It also renames a lot of internal functions to use "branch" instead
of "unlikely".

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 22:28:09 +01:00
Steven Rostedt
2ed84eeb88 trace: rename unlikely profiler to branch profiler
Impact: name change of unlikely tracer and profiler

Ingo Molnar suggested changing the config from UNLIKELY_PROFILE
to BRANCH_PROFILING. I never did like the "unlikely" name so I
went one step farther, and renamed all the unlikely configurations
to a "BRANCH" variant.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 22:27:58 +01:00
Ingo Molnar
68d119f0a6 tracing: finetune branch-tracer output
Steve suggested the to change the output from this:

>  bash-3471  [003]   357.014755: [ MISS ] sched_info_dequeued:sched_stats.h:177
>  bash-3471  [003]   357.014756: [ .... ] update_curr:sched_fair.c:489
>  bash-3471  [003]   357.014758: [ .... ] calc_delta_fair:sched_fair.c:411

to this:

>  bash-3471  [003]   357.014755: [ MISS ] sched_info_dequeued:sched_stats.h:177
>  bash-3471  [003]   357.014756: [  ok  ] update_curr:sched_fair.c:489
>  bash-3471  [003]   357.014758: [  ok  ] calc_delta_fair:sched_fair.c:411

as it makes it clearer to the user what it means exactly.

Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 14:13:11 +01:00
Ingo Molnar
f88c4ae9f8 tracing: branch tracer, tweak output
Impact: modify the tracer output, to make it a bit easier to read

Change the output from:

>  bash-3471  [003]   357.014755: [INCORRECT] sched_info_dequeued:sched_stats.h:177
>  bash-3471  [003]   357.014756: [correct] update_curr:sched_fair.c:489
>  bash-3471  [003]   357.014758: [correct] calc_delta_fair:sched_fair.c:411

to:

>  bash-3471  [003]   357.014755: [ MISS ] sched_info_dequeued:sched_stats.h:177
>  bash-3471  [003]   357.014756: [ .... ] update_curr:sched_fair.c:489
>  bash-3471  [003]   357.014758: [ .... ] calc_delta_fair:sched_fair.c:411

it's good to have fields aligned vertically, and the only important
information is a prediction miss, so display only that information.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 11:55:41 +01:00