linux/kernel
Steven Rostedt 7a8e76a382 tracing: unified trace buffer
This is a unified tracing buffer that implements a ring buffer that
hopefully everyone will eventually be able to use.

The events recorded into the buffer have the following structure:

  struct ring_buffer_event {
	u32 type:2, len:3, time_delta:27;
	u32 array[];
  };

The minimum size of an event is 8 bytes. All events are 4 byte
aligned inside the buffer.

There are 4 types (all internal use for the ring buffer, only
the data type is exported to the interface users).

 RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
	of a buffer page.

 RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
	is greater than the 27 bit delta can hold. We add another
	32 bits, and record that in its own event (8 byte size).

 RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
	help keep the buffer timestamps in sync.

RINGBUF_TYPE_DATA: The event actually holds user data.

The "len" field is only three bits. Since the data must be
4 byte aligned, this field is shifted left by 2, giving a
max length of 28 bytes. If the data load is greater than 28
bytes, the first array field holds the full length of the
data load and the len field is set to zero.

Example, data size of 7 bytes:

	type = RINGBUF_TYPE_DATA
	len = 2
	time_delta: <time-stamp> - <prev_event-time-stamp>
	array[0..1]: <7 bytes of data> <1 byte empty>

This event is saved in 12 bytes of the buffer.

An event with 82 bytes of data:

	type = RINGBUF_TYPE_DATA
	len = 0
	time_delta: <time-stamp> - <prev_event-time-stamp>
	array[0]: 84 (Note the alignment)
	array[1..14]: <82 bytes of data> <2 bytes empty>

The above event is saved in 92 bytes (if my math is correct).
82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.

Do not reference the above event struct directly. Use the following
functions to gain access to the event table, since the
ring_buffer_event structure may change in the future.

ring_buffer_event_length(event): get the length of the event.
	This is the size of the memory used to record this
	event, and not the size of the data pay load.

ring_buffer_time_delta(event): get the time delta of the event
	This returns the delta time stamp since the last event.
	Note: Even though this is in the header, there should
		be no reason to access this directly, accept
		for debugging.

ring_buffer_event_data(event): get the data from the event
	This is the function to use to get the actual data
	from the event. Note, it is only a pointer to the
	data inside the buffer. This data must be copied to
	another location otherwise you risk it being written
	over in the buffer.

ring_buffer_lock: A way to lock the entire buffer.
ring_buffer_unlock: unlock the buffer.

ring_buffer_alloc: create a new ring buffer. Can choose between
	overwrite or consumer/producer mode. Overwrite will
	overwrite old data, where as consumer producer will
	throw away new data if the consumer catches up with the
	producer.  The consumer/producer is the default.

ring_buffer_free: free the ring buffer.

ring_buffer_resize: resize the buffer. Changes the size of each cpu
	buffer. Note, it is up to the caller to provide that
	the buffer is not being used while this is happening.
	This requirement may go away but do not count on it.

ring_buffer_lock_reserve: locks the ring buffer and allocates an
	entry on the buffer to write to.
ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
	the buffer.

ring_buffer_write: writes some data into the ring buffer.

ring_buffer_peek: Look at a next item in the cpu buffer.
ring_buffer_consume: get the next item in the cpu buffer and
	consume it. That is, this function increments the head
	pointer.

ring_buffer_read_start: Start an iterator of a cpu buffer.
	For now, this disables the cpu buffer, until you issue
	a finish. This is just because we do not want the iterator
	to be overwritten. This restriction may change in the future.
	But note, this is used for static reading of a buffer which
	is usually done "after" a trace. Live readings would want
	to use the ring_buffer_consume above, which will not
	disable the ring buffer.

ring_buffer_read_finish: Finishes the read iterator and reenables
	the ring buffer.

ring_buffer_iter_peek: Look at the next item in the cpu iterator.
ring_buffer_read: Read the iterator and increment it.
ring_buffer_iter_reset: Reset the iterator to point to the beginning
	of the cpu buffer.
ring_buffer_iter_empty: Returns true if the iterator is at the end
	of the cpu buffer.

ring_buffer_size: returns the size in bytes of each cpu buffer.
	Note, the real size is this times the number of CPUs.

ring_buffer_reset_cpu: Sets the cpu buffer to empty
ring_buffer_reset: sets all cpu buffers to empty

ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
	cpu buffer of another buffer. This is handy when you
	want to take a snap shot of a running trace on just one
	cpu. Having a backup buffer, to swap with facilitates this.
	Ftrace max latencies use this.

ring_buffer_empty: Returns true if the ring buffer is empty.
ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.

ring_buffer_record_disable: disable all cpu buffers (read only)
ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
ring_buffer_record_enable: enable all cpu buffers.
ring_buffer_record_enabl_cpu: enable a single cpu buffer.

ring_buffer_entries: The number of entries in a ring buffer.
ring_buffer_overruns: The number of entries removed due to writing wrap.

ring_buffer_time_stamp: Get the time stamp used by the ring buffer
ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
	into nanosecs.

I still need to implement the GTOD feature. But we need support from
the cpu frequency infrastructure.  But this can be done at a later
time without affecting the ring buffer interface.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14 10:38:54 +02:00
..
irq Merge branch 'linus' into x86/core 2008-08-14 14:58:01 +02:00
power ftrace: disable tracing for hibernation 2008-08-28 12:27:39 -07:00
time [CPUFREQ][5/6] cpufreq: Changes to get_cpu_idle_time_us(), used by ondemand governor 2008-10-09 13:52:44 -04:00
trace tracing: unified trace buffer 2008-10-14 10:38:54 +02:00
.gitignore
acct.c tty: Fix abusers of current->sighand->tty 2008-10-13 09:51:42 -07:00
audit.c [PATCH] Fix the bug of using AUDIT_STATUS_RATE_LIMIT when set fail, no error output. 2008-08-01 12:15:16 -04:00
audit.h [PATCH 1/2] audit: move extern declarations to audit.h 2008-04-28 06:28:04 -04:00
audit_tree.c [PATCH] list_for_each_rcu must die: audit 2008-05-17 03:30:23 -04:00
auditfilter.c Re: [PATCH] the loginuid field should be output in all AUDIT_CONFIG_CHANGE audit messages 2008-08-01 12:15:03 -04:00
auditsc.c tty: Fix abusers of current->sighand->tty 2008-10-13 09:51:42 -07:00
backtracetest.c backtrace: replace timer with tasklet + completions 2008-06-27 18:09:16 +02:00
bounds.c Add kbuild.h that contains common definitions for kbuild users 2008-04-29 08:06:29 -07:00
capability.c security: Fix setting of PF_SUPERPRIV by __capable() 2008-08-14 22:59:43 +10:00
cgroup.c mm owner: fix race between swapoff and exit 2008-09-29 08:41:47 -07:00
cgroup_debug.c CGroup API files: move "releasable" to cgroup_debug subsystem 2008-04-29 08:06:09 -07:00
compat.c ntp: support for TAI 2008-05-01 08:03:59 -07:00
configs.c kernel: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
cpu.c Merge branches 'sched/devel', 'sched/cpu-hotplug', 'sched/cpusets' and 'sched/urgent' into sched/core 2008-10-08 11:31:02 +02:00
cpuset.c cpusets: scan_for_empty_cpusets(), cpuset doesn't seem to be so const 2008-10-03 13:39:50 +02:00
delayacct.c per-task-delay-accounting: update taskstats for memory reclaim delay 2008-07-25 10:53:47 -07:00
dma-coherent.c dma-coherent: export dma_[alloc|release]_from_coherent methods 2008-08-22 08:34:53 +02:00
dma.c kernel: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
exec_domain.c [PATCH] kill altroot 2008-07-26 20:53:20 -04:00
exit.c tracing, sched: LTTng instrumentation - scheduler 2008-10-14 10:30:52 +02:00
extable.c
fork.c tracing, sched: LTTng instrumentation - scheduler 2008-10-14 10:30:52 +02:00
futex.c futexes: fix fault handling in futex_lock_pi 2008-06-23 13:31:15 +02:00
futex_compat.c futex_compat __user annotation 2008-03-30 14:18:41 -07:00
hrtimer.c hrtimer: prevent migration of per CPU hrtimers 2008-09-29 17:09:14 +02:00
itimer.c
kallsyms.c kallsyms: fix potential overflow in binary search 2008-07-25 10:53:27 -07:00
Kconfig.hz sched: fix SCHED_HRTICK dependency 2008-07-28 14:37:38 +02:00
Kconfig.preempt rcu: move PREEMPT_RCU config option back under PREEMPT 2008-03-10 18:01:20 -07:00
kexec.c kexec: fix segmentation fault in kimage_add_entry 2008-09-23 08:09:14 -07:00
kfifo.c
kgdb.c kgdb: call touch_softlockup_watchdog on resume 2008-10-06 13:50:59 -05:00
kmod.c call_usermodehelper(): increase reliability 2008-07-25 10:53:28 -07:00
kprobes.c kprobes: remove redundant config check 2008-07-25 10:53:30 -07:00
ksysfs.c
kthread.c tracing, sched: LTTng instrumentation - scheduler 2008-10-14 10:30:52 +02:00
latencytop.c kernel: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
lockdep.c lockdep: fix invalid list_del_rcu in zap_class 2008-08-27 08:40:36 +02:00
lockdep_internals.h lockdep: build fix 2008-08-13 12:55:10 +02:00
lockdep_proc.c lockstat: fix numerical output rounding error 2008-08-26 10:37:46 +02:00
Makefile tracing: Kernel Tracepoints 2008-10-14 10:28:28 +02:00
marker.c markers: bit-field is not thread-safe nor smp-safe 2008-10-14 10:38:45 +02:00
module.c ftrace: remove old pointers to mcount 2008-10-14 10:35:12 +02:00
mutex-debug.c mutex-debug: check mutex magic before owner 2008-05-16 16:53:35 +02:00
mutex-debug.h
mutex.c locking: fix mutex @key parameter kernel-doc notation 2008-07-28 18:12:36 +02:00
mutex.h
notifier.c ftrace: ignore functions that cannot be kprobe-ed 2008-10-14 10:34:22 +02:00
ns_cgroup.c cgroup_clone: use pid of newly created task for new cgroup 2008-07-25 10:53:37 -07:00
nsproxy.c removed unused #include <linux/version.h>'s 2008-08-23 12:14:12 -07:00
panic.c Add a WARN() macro; this is WARN_ON() + printk arguments 2008-07-25 10:53:29 -07:00
params.c
pid.c pidns: remove now unused find_pid function. 2008-07-25 10:53:45 -07:00
pid_namespace.c pid_ns: (BUG 11391) change ->child_reaper when init->group_leader exits 2008-09-02 19:21:38 -07:00
pm_qos_params.c pm_qos_requirement might sleep 2008-09-02 19:21:40 -07:00
posix-cpu-timers.c posix-timers: print RT watchdog message 2008-05-24 18:49:22 +02:00
posix-timers.c fix error-path NULL deref in alloc_posix_timer() 2008-10-02 15:53:13 -07:00
printk.c tty: Move tty_write_message out of kernel/printk 2008-10-13 09:51:41 -07:00
profile.c build kernel/profile.o only when requested 2008-07-25 10:53:27 -07:00
ptrace.c security: Fix setting of PF_SUPERPRIV by __capable() 2008-08-14 22:59:43 +10:00
rcuclassic.c rcu: RCU-based detection of stalled CPUs for Classic RCU, fix 2008-10-03 10:41:00 +02:00
rcupdate.c rcu: fix synchronize_rcu() so that kernel-doc works 2008-08-21 09:31:44 +02:00
rcupreempt.c rcu: remove redundant ACCESS_ONCE definition from rcupreempt.c 2008-08-18 09:45:22 +02:00
rcupreempt_trace.c rcu: trace fix possible mem-leak 2008-08-15 17:54:40 +02:00
rcutorture.c rcu: make rcutorture even more vicious: invoke RCU readers from irq handlers (timers) 2008-06-26 09:24:33 +02:00
relay.c relay: fix "full buffer with exactly full last subbuffer" accounting problem 2008-08-05 14:33:46 -07:00
res_counter.c cgroup files: convert res_counter_write() to be a cgroups write_string() handler 2008-07-25 10:53:36 -07:00
resource.c IO resources: add reserve_region_with_split() 2008-09-04 21:02:44 +02:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c sysdev: Pass the attribute to the low level sysdev show/store function 2008-07-21 21:55:02 -07:00
rtmutex.c
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c tracing, sched: LTTng instrumentation - scheduler 2008-10-14 10:30:52 +02:00
sched_clock.c sched_clock: fix cpu_clock() 2008-08-25 17:39:57 +02:00
sched_cpupri.c sched: use a 2-d bitmap for searching lowest-pri CPU 2008-06-06 15:19:28 +02:00
sched_cpupri.h sched: fix the cpuprio count really 2008-06-06 15:19:44 +02:00
sched_debug.c sched: add full schedstats to /proc/sched_debug 2008-06-27 14:31:31 +02:00
sched_fair.c sched: sync wakeups vs avg_overlap 2008-10-08 12:20:26 +02:00
sched_features.h sched: turn off WAKEUP_OVERLAP 2008-09-22 16:29:00 +02:00
sched_idletask.c sched: wakeup preempt when small overlap 2008-09-22 16:28:32 +02:00
sched_rt.c sched_rt.c: resch needed in rt_rq_enqueue() for the root rt_rq 2008-10-04 14:31:54 +02:00
sched_stats.h sched: fix accounting in task delay accounting & migration 2008-07-04 12:50:23 +02:00
seccomp.c
semaphore.c semaphore: __down_common: use signal_pending_state() 2008-08-05 14:33:47 -07:00
signal.c tracing, sched: LTTng instrumentation - scheduler 2008-10-14 10:30:52 +02:00
smp.c smp: have smp_call_function_single() detect invalid CPUs 2008-08-25 17:45:48 -07:00
softirq.c Full conversion to early_initcall() interface, remove old interface 2008-07-26 12:00:04 -07:00
softlockup.c softlockup: minor cleanup, don't check task->state twice 2008-09-02 10:49:51 -07:00
spinlock.c lockdep: spin_lock_nest_lock(), checkpatch fixes 2008-08-13 13:56:51 +02:00
srcu.c
stacktrace.c stacktrace: fix modular build, export print_stack_trace and save_stack_trace 2008-06-30 09:20:55 +02:00
stop_machine.c stop_machine: remove unused variable 2008-08-12 17:52:55 +10:00
sys.c tty: Add a kref count 2008-10-13 09:51:40 -07:00
sys_ni.c signalfd: fix undefined reference to `compat_sys_signalfd4' when !CONFIG_SIGNALFD 2008-07-25 11:35:41 -07:00
sysctl.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 2008-09-16 14:11:43 -07:00
sysctl_check.c sysctl: check for bogus modes 2008-07-25 10:53:45 -07:00
taskstats.c taskstats: remove initialization of static per-cpu variable 2008-07-25 10:53:47 -07:00
test_kprobes.c
time.c Make constants in kernel/timeconst.h fixed 64 bits 2008-05-02 16:18:42 -07:00
timeconst.pl Make constants in kernel/timeconst.h fixed 64 bits 2008-05-02 16:18:42 -07:00
timer.c Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm 2008-07-14 16:06:58 -07:00
tracepoint.c tracepoints: fix reentrancy 2008-10-14 10:38:23 +02:00
tsacct.c task IO accounting: move all IO statistics in struct task_io_accounting 2008-07-27 16:12:28 -07:00
uid16.c asmlinkage_protect replaces prevent_tail_call 2008-04-10 17:28:26 -07:00
user.c sched: rt-bandwidth for user grouping interface 2008-08-19 13:10:09 +02:00
user_namespace.c removed unused #include <linux/version.h>'s 2008-08-23 12:14:12 -07:00
utsname.c removed unused #include <linux/version.h>'s 2008-08-23 12:14:12 -07:00
utsname_sysctl.c removed unused #include <linux/version.h>'s 2008-08-23 12:14:12 -07:00
wait.c
workqueue.c Merge branch 'core/locking' into core/urgent 2008-08-12 00:11:49 +02:00