diff options
author | Changbin Du <changbin.du@intel.com> | 2018-02-17 13:39:38 +0800 |
---|---|---|
committer | Jonathan Corbet <corbet@lwn.net> | 2018-03-07 10:23:10 -0700 |
commit | 1f198e22bc3a0af747a7cf7b444de49ff76b6869 (patch) | |
tree | 4bd4e057bfd0c1df535f680cf8cf1afc5ec519d2 /Documentation/trace | |
parent | 8fa4e720e8d919271cdf0da3c0856333246398a4 (diff) |
trace doc: convert trace/ftrace.txt to rst format
This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Diffstat (limited to 'Documentation/trace')
-rw-r--r-- | Documentation/trace/ftrace.rst | 3332 | ||||
-rw-r--r-- | Documentation/trace/ftrace.txt | 3220 | ||||
-rw-r--r-- | Documentation/trace/index.rst | 1 |
3 files changed, 3333 insertions, 3220 deletions
diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst new file mode 100644 index 000000000000..636aa9bf5674 --- /dev/null +++ b/Documentation/trace/ftrace.rst @@ -0,0 +1,3332 @@ +======================== +ftrace - Function Tracer +======================== + +Copyright 2008 Red Hat Inc. + +:Author: Steven Rostedt <srostedt@redhat.com> +:License: The GNU Free Documentation License, Version 1.2 + (dual licensed under the GPL v2) +:Original Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, + John Kacur, and David Teigland. + +- Written for: 2.6.28-rc2 +- Updated for: 3.10 +- Updated for: 4.13 - Copyright 2017 VMware Inc. Steven Rostedt +- Converted to rst format - Changbin Du <changbin.du@intel.com> + +Introduction +------------ + +Ftrace is an internal tracer designed to help out developers and +designers of systems to find what is going on inside the kernel. +It can be used for debugging or analyzing latencies and +performance issues that take place outside of user-space. + +Although ftrace is typically considered the function tracer, it +is really a frame work of several assorted tracing utilities. +There's latency tracing to examine what occurs between interrupts +disabled and enabled, as well as for preemption and from a time +a task is woken to the task is actually scheduled in. + +One of the most common uses of ftrace is the event tracing. +Through out the kernel is hundreds of static event points that +can be enabled via the tracefs file system to see what is +going on in certain parts of the kernel. + +See events.txt for more information. + + +Implementation Details +---------------------- + +See :doc:`ftrace-design` for details for arch porters and such. + + +The File System +--------------- + +Ftrace uses the tracefs file system to hold the control files as +well as the files to display output. + +When tracefs is configured into the kernel (which selecting any ftrace +option will do) the directory /sys/kernel/tracing will be created. To mount +this directory, you can add to your /etc/fstab file:: + + tracefs /sys/kernel/tracing tracefs defaults 0 0 + +Or you can mount it at run time with:: + + mount -t tracefs nodev /sys/kernel/tracing + +For quicker access to that directory you may want to make a soft link to +it:: + + ln -s /sys/kernel/tracing /tracing + +.. attention:: + + Before 4.1, all ftrace tracing control files were within the debugfs + file system, which is typically located at /sys/kernel/debug/tracing. + For backward compatibility, when mounting the debugfs file system, + the tracefs file system will be automatically mounted at: + + /sys/kernel/debug/tracing + + All files located in the tracefs file system will be located in that + debugfs file system directory as well. + +.. attention:: + + Any selected ftrace option will also create the tracefs file system. + The rest of the document will assume that you are in the ftrace directory + (cd /sys/kernel/tracing) and will only concentrate on the files within that + directory and not distract from the content with the extended + "/sys/kernel/tracing" path name. + +That's it! (assuming that you have ftrace configured into your kernel) + +After mounting tracefs you will have access to the control and output files +of ftrace. Here is a list of some of the key files: + + + Note: all time values are in microseconds. + + current_tracer: + + This is used to set or display the current tracer + that is configured. + + available_tracers: + + This holds the different types of tracers that + have been compiled into the kernel. The + tracers listed here can be configured by + echoing their name into current_tracer. + + tracing_on: + + This sets or displays whether writing to the trace + ring buffer is enabled. Echo 0 into this file to disable + the tracer or 1 to enable it. Note, this only disables + writing to the ring buffer, the tracing overhead may + still be occurring. + + The kernel function tracing_off() can be used within the + kernel to disable writing to the ring buffer, which will + set this file to "0". User space can re-enable tracing by + echoing "1" into the file. + + Note, the function and event trigger "traceoff" will also + set this file to zero and stop tracing. Which can also + be re-enabled by user space using this file. + + trace: + + This file holds the output of the trace in a human + readable format (described below). Note, tracing is temporarily + disabled while this file is being read (opened). + + trace_pipe: + + The output is the same as the "trace" file but this + file is meant to be streamed with live tracing. + Reads from this file will block until new data is + retrieved. Unlike the "trace" file, this file is a + consumer. This means reading from this file causes + sequential reads to display more current data. Once + data is read from this file, it is consumed, and + will not be read again with a sequential read. The + "trace" file is static, and if the tracer is not + adding more data, it will display the same + information every time it is read. This file will not + disable tracing while being read. + + trace_options: + + This file lets the user control the amount of data + that is displayed in one of the above output + files. Options also exist to modify how a tracer + or events work (stack traces, timestamps, etc). + + options: + + This is a directory that has a file for every available + trace option (also in trace_options). Options may also be set + or cleared by writing a "1" or "0" respectively into the + corresponding file with the option name. + + tracing_max_latency: + + Some of the tracers record the max latency. + For example, the maximum time that interrupts are disabled. + The maximum time is saved in this file. The max trace will also be + stored, and displayed by "trace". A new max trace will only be + recorded if the latency is greater than the value in this file + (in microseconds). + + By echoing in a time into this file, no latency will be recorded + unless it is greater than the time in this file. + + tracing_thresh: + + Some latency tracers will record a trace whenever the + latency is greater than the number in this file. + Only active when the file contains a number greater than 0. + (in microseconds) + + buffer_size_kb: + + This sets or displays the number of kilobytes each CPU + buffer holds. By default, the trace buffers are the same size + for each CPU. The displayed number is the size of the + CPU buffer and not total size of all buffers. The + trace buffers are allocated in pages (blocks of memory + that the kernel uses for allocation, usually 4 KB in size). + If the last page allocated has room for more bytes + than requested, the rest of the page will be used, + making the actual allocation bigger than requested or shown. + ( Note, the size may not be a multiple of the page size + due to buffer management meta-data. ) + + Buffer sizes for individual CPUs may vary + (see "per_cpu/cpu0/buffer_size_kb" below), and if they do + this file will show "X". + + buffer_total_size_kb: + + This displays the total combined size of all the trace buffers. + + free_buffer: + + If a process is performing tracing, and the ring buffer should be + shrunk "freed" when the process is finished, even if it were to be + killed by a signal, this file can be used for that purpose. On close + of this file, the ring buffer will be resized to its minimum size. + Having a process that is tracing also open this file, when the process + exits its file descriptor for this file will be closed, and in doing so, + the ring buffer will be "freed". + + It may also stop tracing if disable_on_free option is set. + + tracing_cpumask: + + This is a mask that lets the user only trace on specified CPUs. + The format is a hex string representing the CPUs. + + set_ftrace_filter: + + When dynamic ftrace is configured in (see the + section below "dynamic ftrace"), the code is dynamically + modified (code text rewrite) to disable calling of the + function profiler (mcount). This lets tracing be configured + in with practically no overhead in performance. This also + has a side effect of enabling or disabling specific functions + to be traced. Echoing names of functions into this file + will limit the trace to only those functions. + + The functions listed in "available_filter_functions" are what + can be written into this file. + + This interface also allows for commands to be used. See the + "Filter commands" section for more details. + + set_ftrace_notrace: + + This has an effect opposite to that of + set_ftrace_filter. Any function that is added here will not + be traced. If a function exists in both set_ftrace_filter + and set_ftrace_notrace, the function will _not_ be traced. + + set_ftrace_pid: + + Have the function tracer only trace the threads whose PID are + listed in this file. + + If the "function-fork" option is set, then when a task whose + PID is listed in this file forks, the child's PID will + automatically be added to this file, and the child will be + traced by the function tracer as well. This option will also + cause PIDs of tasks that exit to be removed from the file. + + set_event_pid: + + Have the events only trace a task with a PID listed in this file. + Note, sched_switch and sched_wake_up will also trace events + listed in this file. + + To have the PIDs of children of tasks with their PID in this file + added on fork, enable the "event-fork" option. That option will also + cause the PIDs of tasks to be removed from this file when the task + exits. + + set_graph_function: + + Functions listed in this file will cause the function graph + tracer to only trace these functions and the functions that + they call. (See the section "dynamic ftrace" for more details). + + set_graph_notrace: + + Similar to set_graph_function, but will disable function graph + tracing when the function is hit until it exits the function. + This makes it possible to ignore tracing functions that are called + by a specific function. + + available_filter_functions: + + This lists the functions that ftrace has processed and can trace. + These are the function names that you can pass to + "set_ftrace_filter" or "set_ftrace_notrace". + (See the section "dynamic ftrace" below for more details.) + + dyn_ftrace_total_info: + + This file is for debugging purposes. The number of functions that + have been converted to nops and are available to be traced. + + enabled_functions: + + This file is more for debugging ftrace, but can also be useful + in seeing if any function has a callback attached to it. + Not only does the trace infrastructure use ftrace function + trace utility, but other subsystems might too. This file + displays all functions that have a callback attached to them + as well as the number of callbacks that have been attached. + Note, a callback may also call multiple functions which will + not be listed in this count. + + If the callback registered to be traced by a function with + the "save regs" attribute (thus even more overhead), a 'R' + will be displayed on the same line as the function that + is returning registers. + + If the callback registered to be traced by a function with + the "ip modify" attribute (thus the regs->ip can be changed), + an 'I' will be displayed on the same line as the function that + can be overridden. + + If the architecture supports it, it will also show what callback + is being directly called by the function. If the count is greater + than 1 it most likely will be ftrace_ops_list_func(). + + If the callback of the function jumps to a trampoline that is + specific to a the callback and not the standard trampoline, + its address will be printed as well as the function that the + trampoline calls. + + function_profile_enabled: + + When set it will enable all functions with either the function + tracer, or if configured, the function graph tracer. It will + keep a histogram of the number of functions that were called + and if the function graph tracer was configured, it will also keep + track of the time spent in those functions. The histogram + content can be displayed in the files: + + trace_stats/function<cpu> ( function0, function1, etc). + + trace_stats: + + A directory that holds different tracing stats. + + kprobe_events: + + Enable dynamic trace points. See kprobetrace.txt. + + kprobe_profile: + + Dynamic trace points stats. See kprobetrace.txt. + + max_graph_depth: + + Used with the function graph tracer. This is the max depth + it will trace into a function. Setting this to a value of + one will show only the first kernel function that is called + from user space. + + printk_formats: + + This is for tools that read the raw format files. If an event in + the ring buffer references a string, only a pointer to the string + is recorded into the buffer and not the string itself. This prevents + tools from knowing what that string was. This file displays the string + and address for the string allowing tools to map the pointers to what + the strings were. + + saved_cmdlines: + + Only the pid of the task is recorded in a trace event unless + the event specifically saves the task comm as well. Ftrace + makes a cache of pid mappings to comms to try to display + comms for events. If a pid for a comm is not listed, then + "<...>" is displayed in the output. + + If the option "record-cmd" is set to "0", then comms of tasks + will not be saved during recording. By default, it is enabled. + + saved_cmdlines_size: + + By default, 128 comms are saved (see "saved_cmdlines" above). To + increase or decrease the amount of comms that are cached, echo + in a the number of comms to cache, into this file. + + saved_tgids: + + If the option "record-tgid" is set, on each scheduling context switch + the Task Group ID of a task is saved in a table mapping the PID of + the thread to its TGID. By default, the "record-tgid" option is + disabled. + + snapshot: + + This displays the "snapshot" buffer and also lets the user + take a snapshot of the current running trace. + See the "Snapshot" section below for more details. + + stack_max_size: + + When the stack tracer is activated, this will display the + maximum stack size it has encountered. + See the "Stack Trace" section below. + + stack_trace: + + This displays the stack back trace of the largest stack + that was encountered when the stack tracer is activated. + See the "Stack Trace" section below. + + stack_trace_filter: + + This is similar to "set_ftrace_filter" but it limits what + functions the stack tracer will check. + + trace_clock: + + Whenever an event is recorded into the ring buffer, a + "timestamp" is added. This stamp comes from a specified + clock. By default, ftrace uses the "local" clock. This + clock is very fast and strictly per cpu, but on some + systems it may not be monotonic with respect to other + CPUs. In other words, the local clocks may not be in sync + with local clocks on other CPUs. + + Usual clocks for tracing:: + + # cat trace_clock + [local] global counter x86-tsc + + The clock with the square brackets around it is the one in effect. + + local: + Default clock, but may not be in sync across CPUs + + global: + This clock is in sync with all CPUs but may + be a bit slower than the local clock. + + counter: + This is not a clock at all, but literally an atomic + counter. It counts up one by one, but is in sync + with all CPUs. This is useful when you need to + know exactly the order events occurred with respect to + each other on different CPUs. + + uptime: + This uses the jiffies counter and the time stamp + is relative to the time since boot up. + + perf: + This makes ftrace use the same clock that perf uses. + Eventually perf will be able to read ftrace buffers + and this will help out in interleaving the data. + + x86-tsc: + Architectures may define their own clocks. For + example, x86 uses its own TSC cycle clock here. + + ppc-tb: + This uses the powerpc timebase register value. + This is in sync across CPUs and can also be used + to correlate events across hypervisor/guest if + tb_offset is known. + + mono: + This uses the fast monotonic clock (CLOCK_MONOTONIC) + which is monotonic and is subject to NTP rate adjustments. + + mono_raw: + This is the raw monotonic clock (CLOCK_MONOTONIC_RAW) + which is montonic but is not subject to any rate adjustments + and ticks at the same rate as the hardware clocksource. + + boot: + This is the boot clock (CLOCK_BOOTTIME) and is based on the + fast monotonic clock, but also accounts for time spent in + suspend. Since the clock access is designed for use in + tracing in the suspend path, some side effects are possible + if clock is accessed after the suspend time is accounted before + the fast mono clock is updated. In this case, the clock update + appears to happen slightly sooner than it normally would have. + Also on 32-bit systems, it's possible that the 64-bit boot offset + sees a partial update. These effects are rare and post + processing should be able to handle them. See comments in the + ktime_get_boot_fast_ns() function for more information. + + To set a clock, simply echo the clock name into this file:: + + echo global > trace_clock + + trace_marker: + + This is a very useful file for synchronizing user space + with events happening in the kernel. Writing strings into + this file will be written into the ftrace buffer. + + It is useful in applications to open this file at the start + of the application and just reference the file descriptor + for the file:: + + void trace_write(const char *fmt, ...) + { + va_list ap; + char buf[256]; + int n; + + if (trace_fd < 0) + return; + + va_start(ap, fmt); + n = vsnprintf(buf, 256, fmt, ap); + va_end(ap); + + write(trace_fd, buf, n); + } + + start:: + + trace_fd = open("trace_marker", WR_ONLY); + + trace_marker_raw: + + This is similar to trace_marker above, but is meant for for binary data + to be written to it, where a tool can be used to parse the data + from trace_pipe_raw. + + uprobe_events: + + Add dynamic tracepoints in programs. + See uprobetracer.txt + + uprobe_profile: + + Uprobe statistics. See uprobetrace.txt + + instances: + + This is a way to make multiple trace buffers where different + events can be recorded in different buffers. + See "Instances" section below. + + events: + + This is the trace event directory. It holds event tracepoints + (also known as static tracepoints) that have been compiled + into the kernel. It shows what event tracepoints exist + and how they are grouped by system. There are "enable" + files at various levels that can enable the tracepoints + when a "1" is written to them. + + See events.txt for more information. + + set_event: + + By echoing in the event into this file, will enable that event. + + See events.txt for more information. + + available_events: + + A list of events that can be enabled in tracing. + + See events.txt for more information. + + hwlat_detector: + + Directory for the Hardware Latency Detector. + See "Hardware Latency Detector" section below. + + per_cpu: + + This is a directory that contains the trace per_cpu information. + + per_cpu/cpu0/buffer_size_kb: + + The ftrace buffer is defined per_cpu. That is, there's a separate + buffer for each CPU to allow writes to be done atomically, + and free from cache bouncing. These buffers may have different + size buffers. This file is similar to the buffer_size_kb + file, but it only displays or sets the buffer size for the + specific CPU. (here cpu0). + + per_cpu/cpu0/trace: + + This is similar to the "trace" file, but it will only display + the data specific for the CPU. If written to, it only clears + the specific CPU buffer. + + per_cpu/cpu0/trace_pipe + + This is similar to the "trace_pipe" file, and is a consuming + read, but it will only display (and consume) the data specific + for the CPU. + + per_cpu/cpu0/trace_pipe_raw + + For tools that can parse the ftrace ring buffer binary format, + the trace_pipe_raw file can be used to extract the data + from the ring buffer directly. With the use of the splice() + system call, the buffer data can be quickly transferred to + a file or to the network where a server is collecting the + data. + + Like trace_pipe, this is a consuming reader, where multiple + reads will always produce different data. + + per_cpu/cpu0/snapshot: + + This is similar to the main "snapshot" file, but will only + snapshot the current CPU (if supported). It only displays + the content of the snapshot for a given CPU, and if + written to, only clears this CPU buffer. + + per_cpu/cpu0/snapshot_raw: + + Similar to the trace_pipe_raw, but will read the binary format + from the snapshot buffer for the given CPU. + + per_cpu/cpu0/stats: + + This displays certain stats about the ring buffer: + + entries: + The number of events that are still in the buffer. + + overrun: + The number of lost events due to overwriting when + the buffer was full. + + commit overrun: + Should always be zero. + This gets set if so many events happened within a nested + event (ring buffer is re-entrant), that it fills the + buffer and starts dropping events. + + bytes: + Bytes actually read (not overwritten). + + oldest event ts: + The oldest timestamp in the buffer + + now ts: + The current timestamp + + dropped events: + Events lost due to overwrite option being off. + + read events: + The number of events read. + +The Tracers +----------- + +Here is the list of current tracers that may be configured. + + "function" + + Function call tracer to trace all kernel functions. + + "function_graph" + + Similar to the function tracer except that the + function tracer probes the functions on their entry + whereas the function graph tracer traces on both entry + and exit of the functions. It then provides the ability + to draw a graph of function calls similar to C code + source. + + "blk" + + The block tracer. The tracer used by the blktrace user + application. + + "hwlat" + + The Hardware Latency tracer is used to detect if the hardware + produces any latency. See "Hardware Latency Detector" section + below. + + "irqsoff" + + Traces the areas that disable interrupts and saves + the trace with the longest max latency. + See tracing_max_latency. When a new max is recorded, + it replaces the old trace. It is best to view this + trace with the latency-format option enabled, which + happens automatically when the tracer is selected. + + "preemptoff" + + Similar to irqsoff but traces and records the amount of + time for which preemption is disabled. + + "preemptirqsoff" + + Similar to irqsoff and preemptoff, but traces and + records the largest time for which irqs and/or preemption + is disabled. + + "wakeup" + + Traces and records the max latency that it takes for + the highest priority task to get scheduled after + it has been woken up. + Traces all tasks as an average developer would expect. + + "wakeup_rt" + + Traces and records the max latency that it takes for just + RT tasks (as the current "wakeup" does). This is useful + for those interested in wake up timings of RT tasks. + + "wakeup_dl" + + Traces and records the max latency that it takes for + a SCHED_DEADLINE task to be woken (as the "wakeup" and + "wakeup_rt" does). + + "mmiotrace" + + A special tracer that is used to trace binary module. + It will trace all the calls that a module makes to the + hardware. Everything it writes and reads from the I/O + as well. + + "branch" + + This tracer can be configured when tracing likely/unlikely + calls within the kernel. It will trace when a likely and + unlikely branch is hit and if it was correct in its prediction + of being correct. + + "nop" + + This is the "trace nothing" tracer. To remove all + tracers from tracing simply echo "nop" into + current_tracer. + + +Examples of using the tracer +---------------------------- + +Here are typical examples of using the tracers when controlling +them only with the tracefs interface (without using any +user-land utilities). + +Output format: +-------------- + +Here is an example of the output format of the file "trace":: + + # tracer: function + # + # entries-in-buffer/entries-written: 140080/250280 #P:4 + # + # _-----=> irqs-off + # / _----=> need-resched + # | / _---=> hardirq/softirq + # || / _--=> preempt-depth + # ||| / delay + # TASK-PID CPU# |||| TIMESTAMP FUNCTION + # | | | |||| | | + bash-1977 [000] .... 17284.993652: sys_close <-system_call_fastpath + bash-1977 [000] .... 17284.993653: __close_fd <-sys_close + bash-1977 [000] .... 17284.993653: _raw_spin_lock <-__close_fd + sshd-1974 [003] .... 17284.993653: __srcu_read_unlock <-fsnotify + bash-1977 [000] .... 17284.993654: add_preempt_count <-_raw_spin_lock + bash-1977 [000] ...1 17284.993655: _raw_spin_unlock <-__close_fd + bash-1977 [000] ...1 17284.993656: sub_preempt_count <-_raw_spin_unlock + bash-1977 [000] .... 17284.993657: filp_close <-__close_fd + bash-1977 [000] .... 17284.993657: dnotify_flush <-filp_close + sshd-1974 [003] .... 17284.993658: sys_select <-system_call_fastpath + .... + +A header is printed with the tracer name that is represented by +the trace. In this case the tracer is "function". Then it shows the +number of events in the buffer as well as the total number of entries +that were written. The difference is the number of entries that were +lost due to the buffer filling up (250280 - 140080 = 110200 events +lost). + +The header explains the content of the events. Task name "bash", the task +PID "1977", the CPU that it was running on "000", the latency format +(explained below), the timestamp in <secs>.<usecs> format, the +function name that was traced "sys_close" and the parent function that +called this function "system_call_fastpath". The timestamp is the time +at which the function was entered. + +Latency trace format +-------------------- + +When the latency-format option is enabled or when one of the latency +tracers is set, the trace file gives somewhat more information to see +why a latency happened. Here is a typical trace:: + + # tracer: irqsoff + # + # irqsoff latency trace v1.1.5 on 3.8.0-test+ + # -------------------------------------------------------------------- + # latency: 259 us, #4/4, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) + # ----------------- + # | task: ps-6143 (uid:0 nice:0 policy:0 rt_prio:0) + # ----------------- + # => started at: __lock_task_sighand + # => ended at: _raw_spin_unlock_irqrestore + # + # + # _------=> CPU# + # / _-----=> irqs-off + # | / _----=> need-resched + # || / _---=> hardirq/softirq + # ||| / _--=> preempt-depth + # |||| / delay + # cmd pid ||||| time | caller + # \ / ||||| \ | / + ps-6143 2d... 0us!: trace_hardirqs_off <-__lock_task_sighand + ps-6143 2d..1 259us+: trace_hardirqs_on <-_raw_spin_unlock_irqrestore + ps-6143 2d..1 263us+: time_hardirqs_on <-_raw_spin_unlock_irqrestore + ps-6143 2d..1 306us : <stack trace> + => trace_hardirqs_on_caller + => trace_hardirqs_on + => _raw_spin_unlock_irqrestore + => do_task_stat + => proc_tgid_stat + => proc_single_show + => seq_read + => vfs_read + => sys_read + => system_call_fastpath + + +This shows that the current tracer is "irqsoff" tracing the time +for which interrupts were disabled. It gives the trace version (which +never changes) and the version of the kernel upon which this was executed on +(3.8). Then it displays the max latency in microseconds (259 us). The number +of trace entries displayed and the total number (both are four: #4/4). +VP, KP, SP, and HP are always zero and are reserved for later use. +#P is the number of online CPUs (#P:4). + +The task is the process that was running when the latency +occurred. (ps pid: 6143). + +The start and stop (the functions in which the interrupts were +disabled and enabled respectively) that caused the latencies: + + - __lock_task_sighand is where the interrupts were disabled. + - _raw_spin_unlock_irqrestore is where they were enabled again. + +The next lines after the header are the trace itself. The header +explains which is which. + + cmd: The name of the process in the trace. + + pid: The PID of that process. + + CPU#: The CPU which the process was running on. + + irqs-off: 'd' interrupts are disabled. '.' otherwise. + .. caution:: If the architecture does not support a way to + read the irq flags variable, an 'X' will always + be printed here. + + need-resched: + - 'N' both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED is set, + - 'n' only TIF_NEED_RESCHED is set, + - 'p' only PREEMPT_NEED_RESCHED is set, + - '.' otherwise. + + hardirq/softirq: + - 'Z' - NMI occurred inside a hardirq + - 'z' - NMI is running + - 'H' - hard irq occurred inside a softirq. + - 'h' - hard irq is running + - 's' - soft irq is running + - '.' - normal context. + + preempt-depth: The level of preempt_disabled + +The above is mostly meaningful for kernel developers. + + time: + When the latency-format option is enabled, the trace file + output includes a timestamp relative to the start of the + trace. This differs from the output when latency-format + is disabled, which includes an absolute timestamp. + + delay: + This is just to help catch your eye a bit better. And + needs to be fixed to be only relative to the same CPU. + The marks are determined by the difference between this + current trace and the next trace. + + - '$' - greater than 1 second + - '@' - greater than 100 milisecond + - '*' - greater than 10 milisecond + - '#' - greater than 1000 microsecond + - '!' - greater than 100 microsecond + - '+' - greater than 10 microsecond + - ' ' - less than or equal to 10 microsecond. + + The rest is the same as the 'trace' file. + + Note, the latency tracers will usually end with a back trace + to easily find where the latency occurred. + +trace_options +------------- + +The trace_options file (or the options directory) is used to control +what gets printed in the trace output, or manipulate the tracers. +To see what is available, simply cat the file:: + + cat trace_options + print-parent + nosym-offset + nosym-addr + noverbose + noraw + nohex + nobin + noblock + trace_printk + annotate + nouserstacktrace + nosym-userobj + noprintk-msg-only + context-info + nolatency-format + record-cmd + norecord-tgid + overwrite + nodisable_on_free + irq-info + markers + noevent-fork + function-trace + nofunction-fork + nodisplay-graph + nostacktrace + nobranch + +To disable one of the options, echo in the option prepended with +"no":: + + echo noprint-parent > trace_options + +To enable an option, leave off the "no":: + + echo sym-offset > trace_options + +Here are the available options: + + print-parent + On function traces, display the calling (parent) + function as well as the function being traced. + :: + + print-parent: + bash-4000 [01] 1477.606694: simple_strtoul <-kstrtoul + + noprint-parent: + bash-4000 [01] 1477.606694: simple_strtoul + + + sym-offset + Display not only the function name, but also the + offset in the function. For example, instead of + seeing just "ktime_get", you will see + "ktime_get+0xb/0x20". + :: + + sym-offset: + bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0 + + sym-addr + This will also display the function address as well + as the function name. + :: + + sym-addr: + bash-4000 [01] 1477.606694: simple_strtoul <c0339346> + + verbose + This deals with the trace file when the + latency-format option is enabled. + :: + + bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \ + (+0.000ms): simple_strtoul (kstrtoul) + + raw + This will display raw numbers. This option is best for + use with user applications that can translate the raw + numbers better than having it done in the kernel. + + hex + Similar to raw, but the numbers will be in a hexadecimal format. + + bin + This will print out the formats in raw binary. + + block + When set, reading trace_pipe will not block when polled. + + trace_printk + Can disable trace_printk() from writing into the buffer. + + annotate + It is sometimes confusing when the CPU buffers are full + and one CPU buffer had a lot of events recently, thus + a shorter time frame, were another CPU may have only had + a few events, which lets it have older events. When + the trace is reported, it shows the oldest events first, + and it may look like only one CPU ran (the one with the + oldest events). When the annotate option is set, it will + display when a new CPU buffer started:: + + <idle>-0 [001] dNs4 21169.031481: wake_up_idle_cpu <-add_timer_on + <idle>-0 [001] dNs4 21169.031482: _raw_spin_unlock_irqrestore <-add_timer_on + <idle>-0 [001] .Ns4 21169.031484: sub_preempt_count <-_raw_spin_unlock_irqrestore + ##### CPU 2 buffer started #### + <idle>-0 [002] .N.1 21169.031484: rcu_idle_exit <-cpu_idle + <idle>-0 [001] .Ns3 21169.031484: _raw_spin_unlock <-clocksource_watchdog + <idle>-0 [001] .Ns3 21169.031485: sub_preempt_count <-_raw_spin_unlock + + userstacktrace + This option changes the trace. It records a + stacktrace of the current user space thread after + each trace event. + + sym-userobj + when user stacktrace are enabled, look up which + object the address belongs to, and print a + relative address. This is especially useful when + ASLR is on, otherwise you don't get a chance to + resolve the address to object/file/line after + the app is no longer running + + The lookup is performed when you read + trace,trace_pipe. Example:: + + a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0 + x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] + + + printk-msg-only + When set, trace_printk()s will only show the format + and not their parameters (if trace_bprintk() or + trace_bputs() was used to save the trace_printk()). + + co |