Impact: use percpu data instead of a global structure
Use:
static DEFINE_PER_CPU(struct workqueue_global_stats, all_workqueue_stat);
instead of allocating a global structure.
percpu data also works well on NUMA.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change the hw-branch-tracer format to be more readable.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reset the ftrace buffer on close. Since we use cyclic buffers, the
trace is not contiguous, anyway.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dump the branch trace on an oops (based on ftrace_dump_on_oops).
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: trace max latencies on start of latency tracing
This patch sets the max latency to zero whenever one of the
irq variant tracers or the wakeup tracer is set to current tracer.
Most developers expect to see output when starting up a latency
tracer. But since the max_latency is already set to max, and
it takes a latency greater than max_latency to be recorded, there
is no trace. This is not the expected behavior and has even confused
myself.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: clean up
After reorganizing the functions in trace.c and trace_function.c,
they no longer need to be in global context. This patch makes the
functions and one variable into static.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: less likely to interleave function and stack traces
This patch does replaces the separate stack trace on function with
a record function and stack trace together. This will switch between
the function only recording to a function and stack recording.
Also some whitespace fix ups as well.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After adding the printf format checking for trace_seq_printf, several
warnings now show up. This patch cleans them up.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andrew Morton suggested adding a printf checker to trace_seq_printf
since there are a number of users that have improper format arguments.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: clean up of trace.c
The function tracer functions were put in trace.c because it needed
to share static variables that were in trace.c. Since then, those
variables have become global for various reasons. This patch moves
the function tracer functions into trace_function.c where they belong.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: new feature to stack trace any function
Chris Mason asked about being able to pick and choose a function
and get a stack trace from it. This feature enables his request.
# echo io_schedule > /debug/tracing/set_ftrace_filter
# echo function > /debug/tracing/current_tracer
# echo func_stack_trace > /debug/tracing/trace_options
Produces the following in /debug/tracing/trace:
kjournald-702 [001] 135.673060: io_schedule <-sync_buffer
kjournald-702 [002] 135.673671:
<= sync_buffer
<= __wait_on_bit
<= out_of_line_wait_on_bit
<= __wait_on_buffer
<= sync_dirty_buffer
<= journal_commit_transaction
<= kjournald
Note, be careful about turning this on without filtering the functions.
You may find that you have a 10 second lag between typing and seeing
what you typed. This is why the stack trace for the function tracer
does not use the same stack_trace flag as the other tracers use.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix bug for handling partial line
trace_seq_printf(), seq_print_userip_objs(), ... return
0 -- partial line was written
other(>0) -- success
duplicate output is also removed in trace_print_raw().
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We should unlock all_stat_sessions_mutex before returning failure.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix a crash while kernel image restore
When the function graph tracer is running and while suspend to disk, some racy
and dangerous things happen against this tracer.
The current task will save its registers including the stack pointer which
contains the return address hooked by the tracer. But the current task will
continue to enter other functions after that to save the memory, and then
it will store other return addresses, and finally loose the old depth which
matches the return address saved in the old stack (during the registers saving).
So on image restore, the code will return to wrong addresses.
And there are other things: on restore, the task will have it's "current"
pointer overwritten during registers restoring....switching from one task to
another... That would be insane to try to trace function graphs at these
stages.
This patch makes the function graph tracer listening on power events, making
it's tracing disabled for the current task (the one that performs the
hibernation work) while suspend/resume to disk, making the tracing safe
during hibernation.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: limit ftrace dump output
Currently ftrace_dump only calls ftrace_kill that is a fast way
to prevent the function tracer functions from being called (just sets
a flag and clears the function to call, nothing else). It is better
to also turn off any recording to the ring buffers as well.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: reset struct buffer_page.write when interrupt storm
if struct buffer_page.write is not reset, any succedent committing
will corrupted ring_buffer:
static inline void
rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
{
......
cpu_buffer->commit_page->commit =
cpu_buffer->commit_page->write;
......
}
when "if (RB_WARN_ON(cpu_buffer, next_page == reader_page))", ring_buffer
is disabled, but some reserved buffers may haven't been committed.
we need reset struct buffer_page.write.
when "if (unlikely(next_page == cpu_buffer->commit_page))", ring_buffer
is still available, we should not corrupt it.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix to print out ftrace_dump when expected
I was debugging a hard race condition to only find out that
after I hit the race, my log level was not at level to show
KERN_INFO. The time it took to trigger the race was wasted because
I did not capture the trace.
Since ftrace_dump is only called from kernel oops (and only when
it is set in the kernel command line to do so), or when a
developer adds it to their own local tree, the log level of
the print should be at KERN_EMERG to make sure the print appears.
ftrace_dump is not called by a normal user setup, and will not
add extra unwanted print out to the console. There is no reason
it should be at KERN_INFO.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: new tracer
The workqueue tracer provides some statistical informations
about each cpu workqueue thread such as the number of the
works inserted and executed since their creation. It can help
to evaluate the amount of work each of them have to perform.
For example it can help a developer to decide whether he should
choose a per cpu workqueue instead of a singlethreaded one.
It only traces statistical informations for now but it will probably later
provide event tracing too.
Such a tracer could help too, and be improved, to help rt priority sorted
workqueue development.
To have a snapshot of the workqueues state at any time, just do
cat /debugfs/tracing/trace_stat/workqueues
Ie:
1 125 125 reiserfs/1
1 0 0 scsi_tgtd/1
1 0 0 aio/1
1 0 0 ata/1
1 114 114 kblockd/1
1 0 0 kintegrityd/1
1 2147 2147 events/1
0 0 0 kpsmoused
0 105 105 reiserfs/0
0 0 0 scsi_tgtd/0
0 0 0 aio/0
0 0 0 ata_aux
0 0 0 ata/0
0 0 0 cqueue
0 0 0 kacpi_notify
0 0 0 kacpid
0 149 149 kblockd/0
0 0 0 kintegrityd/0
0 1000 1000 khelper
0 2270 2270 events/0
Changes in V2:
_ Drop the static array based on NR_CPU and dynamically allocate the stat array
with num_possible_cpus() and other cpu mask facilities....
_ Trace workqueue insertion at a bit lower level (insert_work instead of queue_work) to handle
even the workqueue barriers.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: tracing's Api change
Currently, the stat tracing depends on the events tracing.
When you switch to a new tracer, the stats files of the previous tracer
will disappear. But it's more scalable to separate those two engines.
This way, we can keep the stat files of one or several tracers when we
want, without bothering of multiple tracer stat files or tracer switching.
To build/destroys its stats files, a tracer just have to call
register_stat_tracer/unregister_stat_tracer everytimes it wants to.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Define FTRACE_ADDR. In IA64, a function pointer isn't a 'unsigned long' but a
'struct {unsigned long ip, unsigned long gp}'.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In IA64, a function pointer isn't a 'unsigned long' but a
'struct {unsigned long ip, unsigned long gp}'. MCOUNT_ADDR is determined
at link time not compile time, so explictly ignore kernel/trace/ftrace.o
in recordmcount.pl.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: enhances lost events counting in mmiotrace
The tracing framework, or the ring buffer facility it uses, has a switch
to stop recording data. When recording is off, the trace events will be
lost. The framework does not count these, so mmiotrace has to count them
itself.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cosmetic change in Kconfig menu layout
This patch was originally suggested by Peter Zijlstra, but seems it
was forgotten.
CONFIG_MMIOTRACE and CONFIG_MMIOTRACE_TEST were selectable
directly under the Kernel hacking / debugging menu in the kernel
configuration system. They were present only for x86 and x86_64.
Other tracers that use the ftrace tracing framework are in their own
sub-menu. This patch moves the mmiotrace configuration options there.
Since the Kconfig file, where the tracer menu is, is not architecture
specific, HAVE_MMIOTRACE_SUPPORT is introduced and provided only by
x86/x86_64. CONFIG_MMIOTRACE now depends on it.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: new API for tracers
Make the stat tracing API reentrant. And also provide the new directory
/debugfs/tracing/trace_stat which will contain all the stat files for the
current active tracer.
Now a tracer will, if desired, want to provide a zero terminated array of
tracer_stat structures.
Each one contains the callbacks necessary for one stat file.
It have to provide at least a name for its stat file, an iterator with
stat_start/start_next callback and an output callback for one stat entry.
Also adapt the branch tracer to this new API.
We create two files "all" and "annotated" inside the /debugfs/tracing/trace_stat
directory, making the both stats simultaneously available instead of needing
to change an option to switch from one stat file to another.
The output of these stats haven't changed.
Changes in v2:
_ Apply the previous memory leak fix (rebase against tip/master)
Changes in v3:
_ Merge the patch that adapted the branch tracer to this Api in this patch to
not break the kernel build.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile: (31 commits)
powerpc/oprofile: fix whitespaces in op_model_cell.c
powerpc/oprofile: IBM CELL: add SPU event profiling support
powerpc/oprofile: fix cell/pr_util.h
powerpc/oprofile: IBM CELL: cleanup and restructuring
oprofile: make new cpu buffer functions part of the api
oprofile: remove #ifdef CONFIG_OPROFILE_IBS in non-ibs code
ring_buffer: fix ring_buffer_event_length()
oprofile: use new data sample format for ibs
oprofile: add op_cpu_buffer_get_data()
oprofile: add op_cpu_buffer_add_data()
oprofile: rework implementation of cpu buffer events
oprofile: modify op_cpu_buffer_read_entry()
oprofile: add op_cpu_buffer_write_reserve()
oprofile: rename variables in add_ibs_begin()
oprofile: rename add_sample() in cpu_buffer.c
oprofile: rename variable ibs_allowed to has_ibs in op_model_amd.c
oprofile: making add_sample_entry() inline
oprofile: remove backtrace code for ibs
oprofile: remove unused ibs macro
oprofile: remove unused components in struct oprofile_cpu_buffer
...
Function ring_buffer_event_length() provides an interface to detect
the length of data stored in an entry. However, the length contains
offsets depending on the internal usage. This makes it unusable. This
patch fixes this and now ring_buffer_event_length() returns the
alligned length that has been used in ring_buffer_lock_reserve().
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Impact: clean up
Andrew Morton pointed out that the entry assignment in stat_seq_show
did not need to be done in the declaration, causing funny line breaks.
This patch makes it a bit more pleasing on the eyes.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix memory leak
This patch fixes a memory leak inside reset_stat_list(). The freeing
loop iterated only once.
Also turn the stat_list into a simple struct list_head, which
simplify the code and avoid an unused static pointer.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix to unsigned compared to less than zero
Roel Kluin pointed out that there is a compare of an unsigned number
to less than zero. A previous clean up had the unsigned index set
to -1 for certain cases, but never converted it to signed.
Frederic Weisbecker noticed that another index is used to compare
the above index to and it also needs to be converted to signed.
[
Converted ftrace_page->index to int from unsigned long as
Andrew Morton pointed out that there's no need for it to
stay a long.
]
Reported-by: Roel Kluin <roel.kluin@gmail.com>
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix
leftover from the relayfs version - but we want to keep it because
this call is the earliest opportunity when we can start kmemtrace
tracing. (after kmem_cache_init()).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
kmemtrace now uses ftrace. This patch removes the relay version.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Reduce future memory usage, use new cpumask API.
Since the last patch was created and acked, more old cpumask users
slipped into kernel/trace.
Mostly trivial conversions, except struct trace_iterator's "started"
member becomes a cpumask_var_t.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: Reduce future memory usage, use new cpumask API.
(Eventually, cpumask_var_t will be allocated based on nr_cpu_ids, not NR_CPUS).
Convert kernel trace functions to use struct cpumask API:
1) Use cpumask_copy/cpumask_test_cpu/for_each_cpu.
2) Use cpumask_var_t and alloc_cpumask_var/free_cpumask_var everywhere.
3) Use on_each_cpu instead of playing with current->cpus_allowed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
* 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
oprofile: select RING_BUFFER
ring_buffer: adding EXPORT_SYMBOLs
oprofile: fix lost sample counter
oprofile: remove nr_available_slots()
oprofile: port to the new ring_buffer
ring_buffer: add remaining cpu functions to ring_buffer.h
oprofile: moving cpu_buffer_reset() to cpu_buffer.h
oprofile: adding cpu_buffer_entries()
oprofile: adding cpu_buffer_write_commit()
oprofile: adding cpu buffer r/w access functions
ftrace: remove unused function arg in trace_iterator_increment()
ring_buffer: update description for ring_buffer_alloc()
oprofile: set values to default when creating oprofilefs
oprofile: implement switch/case in buffer_sync.c
x86/oprofile: cleanup IBS init/exit functions in op_model_amd.c
x86/oprofile: reordering IBS code in op_model_amd.c
oprofile: fix typo
oprofile: whitspace changes only
oprofile: update comment for oprofile_add_sample()
oprofile: comment cleanup
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
hrtimers: fix warning in kernel/hrtimer.c
x86: make sure we really have an hpet mapping before using it
x86: enable HPET on Fujitsu u9200
linux/timex.h: cleanup for userspace
posix-timers: simplify de_thread()->exit_itimers() path
posix-timers: check ->it_signal instead of ->it_pid to validate the timer
posix-timers: use "struct pid*" instead of "struct task_struct*"
nohz: suppress needless timer reprogramming
clocksource, acpi_pm.c: put acpi_pm_read_slow() under CONFIG_PCI
nohz: no softirq pending warnings for offline cpus
hrtimer: removing all ur callback modes, fix
hrtimer: removing all ur callback modes, fix hotplug
hrtimer: removing all ur callback modes
x86: correct link to HPET timer specification
rtc-cmos: export second NVRAM bank
Fixed up conflicts in sound/drivers/pcsp/pcsp.c and sound/core/hrtimer.c
manually.
Impact: new tracer plugin
This patch adapts kmemtrace raw events tracing to the unified tracing API.
To enable and use this tracer, just do the following:
echo kmemtrace > /debugfs/tracing/current_tracer
cat /debugfs/tracing/trace
You will have the following output:
# tracer: kmemtrace
#
#
# ALLOC TYPE REQ GIVEN FLAGS POINTER NODE CALLER
# FREE | | | | | | | |
# |
type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
That was to stay backward compatible with the format output produced in
inux/tracepoint.h.
This is the default ouput, but note that I tried something else.
If you change an option:
echo kmem_minimalistic > /debugfs/trace_options
and then cat /debugfs/trace, you will have the following output:
# tracer: kmemtrace
#
#
# ALLOC TYPE REQ GIVEN FLAGS POINTER NODE CALLER
# FREE | | | | | | | |
# |
- C 0xffff88007c088780 file_free_rcu
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
+ K 240 240 000000d0 0xffff8800790dc780 -1 d_alloc
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
+ K 240 240 000000d0 0xffff8800790dc870 -1 d_alloc
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
+ K 240 240 000000d0 0xffff8800790dc960 -1 d_alloc
+ K 1304 1312 000000d0 0xffff8800791d7340 -1 reiserfs_alloc_inode
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
- C 0xffff88007cad6000 putname
+ K 992 1000 000000d0 0xffff880079045b58 -1 alloc_inode
+ K 768 1024 000080d0 0xffff88007c096400 -1 alloc_pipe_info
+ K 240 240 000000d0 0xffff8800790dca50 -1 d_alloc
+ K 272 320 000080d0 0xffff88007c088780 -1 get_empty_filp
+ K 272 320 000080d0 0xffff88007c088000 -1 get_empty_filp
Yeah I shall confess kmem_minimalistic should be: kmem_alternative.
Whatever, I find it more readable but this a personal opinion of course.
We can drop it if you want.
On the ALLOC/FREE column, + means an allocation and - a free.
On the type column, you have K = kmalloc, C = cache, P = page
I would like the flags to be GFP_* strings but that would not be easy to not
break the column with strings....
About the node...it seems to always be -1. I don't know why but that shouldn't
be difficult to find.
I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would
be more easy to find the tracer headers if they are all in their common
directory.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>