From 73d9812781fcdf49f279875dd8f13d31b84ccb02 Mon Sep 17 00:00:00 2001 From: Changbin Du Date: Sat, 17 Feb 2018 13:39:42 +0800 Subject: trace doc: convert trace/events.txt to rst format This converts the plain text documentation to reStructuredText format and add it into Sphinx TOC tree. No essential content change. Cc: Steven Rostedt Signed-off-by: Changbin Du Signed-off-by: Jonathan Corbet --- Documentation/trace/events.rst | 2069 ++++++++++++++++++++++++++++++++++++++++ Documentation/trace/events.txt | 2066 --------------------------------------- Documentation/trace/index.rst | 1 + 3 files changed, 2070 insertions(+), 2066 deletions(-) create mode 100644 Documentation/trace/events.rst delete mode 100644 Documentation/trace/events.txt (limited to 'Documentation/trace') diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst new file mode 100644 index 000000000000..27bfd06ae29d --- /dev/null +++ b/Documentation/trace/events.rst @@ -0,0 +1,2069 @@ +============= +Event Tracing +============= + +:Author: Theodore Ts'o +:Updated: Li Zefan and Tom Zanussi + +1. Introduction +=============== + +Tracepoints (see Documentation/trace/tracepoints.txt) can be used +without creating custom kernel modules to register probe functions +using the event tracing infrastructure. + +Not all tracepoints can be traced using the event tracing system; +the kernel developer must provide code snippets which define how the +tracing information is saved into the tracing buffer, and how the +tracing information should be printed. + +2. Using Event Tracing +====================== + +2.1 Via the 'set_event' interface +--------------------------------- + +The events which are available for tracing can be found in the file +/sys/kernel/debug/tracing/available_events. + +To enable a particular event, such as 'sched_wakeup', simply echo it +to /sys/kernel/debug/tracing/set_event. For example:: + + # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event + +.. Note:: '>>' is necessary, otherwise it will firstly disable all the events. + +To disable an event, echo the event name to the set_event file prefixed +with an exclamation point:: + + # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event + +To disable all events, echo an empty line to the set_event file:: + + # echo > /sys/kernel/debug/tracing/set_event + +To enable all events, echo '*:*' or '*:' to the set_event file:: + + # echo *:* > /sys/kernel/debug/tracing/set_event + +The events are organized into subsystems, such as ext4, irq, sched, +etc., and a full event name looks like this: :. The +subsystem name is optional, but it is displayed in the available_events +file. All of the events in a subsystem can be specified via the syntax +":*"; for example, to enable all irq events, you can use the +command:: + + # echo 'irq:*' > /sys/kernel/debug/tracing/set_event + +2.2 Via the 'enable' toggle +--------------------------- + +The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy +of directories. + +To enable event 'sched_wakeup':: + + # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable + +To disable it:: + + # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable + +To enable all events in sched subsystem:: + + # echo 1 > /sys/kernel/debug/tracing/events/sched/enable + +To enable all events:: + + # echo 1 > /sys/kernel/debug/tracing/events/enable + +When reading one of these enable files, there are four results: + + - 0 - all events this file affects are disabled + - 1 - all events this file affects are enabled + - X - there is a mixture of events enabled and disabled + - ? - this file does not affect any event + +2.3 Boot option +--------------- + +In order to facilitate early boot debugging, use boot option:: + + trace_event=[event-list] + +event-list is a comma separated list of events. See section 2.1 for event +format. + +3. Defining an event-enabled tracepoint +======================================= + +See The example provided in samples/trace_events + +4. Event formats +================ + +Each trace event has a 'format' file associated with it that contains +a description of each field in a logged event. This information can +be used to parse the binary trace stream, and is also the place to +find the field names that can be used in event filters (see section 5). + +It also displays the format string that will be used to print the +event in text mode, along with the event name and ID used for +profiling. + +Every event has a set of 'common' fields associated with it; these are +the fields prefixed with 'common_'. The other fields vary between +events and correspond to the fields defined in the TRACE_EVENT +definition for that event. + +Each field in the format has the form:: + + field:field-type field-name; offset:N; size:N; + +where offset is the offset of the field in the trace record and size +is the size of the data item, in bytes. + +For example, here's the information displayed for the 'sched_wakeup' +event:: + + # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format + + name: sched_wakeup + ID: 60 + format: + field:unsigned short common_type; offset:0; size:2; + field:unsigned char common_flags; offset:2; size:1; + field:unsigned char common_preempt_count; offset:3; size:1; + field:int common_pid; offset:4; size:4; + field:int common_tgid; offset:8; size:4; + + field:char comm[TASK_COMM_LEN]; offset:12; size:16; + field:pid_t pid; offset:28; size:4; + field:int prio; offset:32; size:4; + field:int success; offset:36; size:4; + field:int cpu; offset:40; size:4; + + print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid, + REC->prio, REC->success, REC->cpu + +This event contains 10 fields, the first 5 common and the remaining 5 +event-specific. All the fields for this event are numeric, except for +'comm' which is a string, a distinction important for event filtering. + +5. Event filtering +================== + +Trace events can be filtered in the kernel by associating boolean +'filter expressions' with them. As soon as an event is logged into +the trace buffer, its fields are checked against the filter expression +associated with that event type. An event with field values that +'match' the filter will appear in the trace output, and an event whose +values don't match will be discarded. An event with no filter +associated with it matches everything, and is the default when no +filter has been set for an event. + +5.1 Expression syntax +--------------------- + +A filter expression consists of one or more 'predicates' that can be +combined using the logical operators '&&' and '||'. A predicate is +simply a clause that compares the value of a field contained within a +logged event with a constant value and returns either 0 or 1 depending +on whether the field value matched (1) or didn't match (0):: + + field-name relational-operator value + +Parentheses can be used to provide arbitrary logical groupings and +double-quotes can be used to prevent the shell from interpreting +operators as shell metacharacters. + +The field-names available for use in filters can be found in the +'format' files for trace events (see section 4). + +The relational-operators depend on the type of the field being tested: + +The operators available for numeric fields are: + +==, !=, <, <=, >, >=, & + +And for string fields they are: + +==, !=, ~ + +The glob (~) accepts a wild card character (*,?) and character classes +([). For example:: + + prev_comm ~ "*sh" + prev_comm ~ "sh*" + prev_comm ~ "*sh*" + prev_comm ~ "ba*sh" + +5.2 Setting filters +------------------- + +A filter for an individual event is set by writing a filter expression +to the 'filter' file for the given event. + +For example:: + + # cd /sys/kernel/debug/tracing/events/sched/sched_wakeup + # echo "common_preempt_count > 4" > filter + +A slightly more involved example:: + + # cd /sys/kernel/debug/tracing/events/signal/signal_generate + # echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter + +If there is an error in the expression, you'll get an 'Invalid +argument' error when setting it, and the erroneous string along with +an error message can be seen by looking at the filter e.g.:: + + # cd /sys/kernel/debug/tracing/events/signal/signal_generate + # echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter + -bash: echo: write error: Invalid argument + # cat filter + ((sig >= 10 && sig < 15) || dsig == 17) && comm != bash + ^ + parse_error: Field not found + +Currently the caret ('^') for an error always appears at the beginning of +the filter string; the error message should still be useful though +even without more accurate position info. + +5.3 Clearing filters +-------------------- + +To clear the filter for an event, write a '0' to the event's filter +file. + +To clear the filters for all events in a subsystem, write a '0' to the +subsystem's filter file. + +5.3 Subsystem filters +--------------------- + +For convenience, filters for every event in a subsystem can be set or +cleared as a group by writing a filter expression into the filter file +at the root of the subsystem. Note however, that if a filter for any +event within the subsystem lacks a field specified in the subsystem +filter, or if the filter can't be applied for any other reason, the +filter for that event will retain its previous setting. This can +result in an unintended mixture of filters which could lead to +confusing (to the user who might think different filters are in +effect) trace output. Only filters that reference just the common +fields can be guaranteed to propagate successfully to all events. + +Here are a few subsystem filter examples that also illustrate the +above points: + +Clear the filters on all events in the sched subsystem:: + + # cd /sys/kernel/debug/tracing/events/sched + # echo 0 > filter + # cat sched_switch/filter + none + # cat sched_wakeup/filter + none + +Set a filter using only common fields for all events in the sched +subsystem (all events end up with the same filter):: + + # cd /sys/kernel/debug/tracing/events/sched + # echo common_pid == 0 > filter + # cat sched_switch/filter + common_pid == 0 + # cat sched_wakeup/filter + common_pid == 0 + +Attempt to set a filter using a non-common field for all events in the +sched subsystem (all events but those that have a prev_pid field retain +their old filters):: + + # cd /sys/kernel/debug/tracing/events/sched + # echo prev_pid == 0 > filter + # cat sched_switch/filter + prev_pid == 0 + # cat sched_wakeup/filter + common_pid == 0 + +5.4 PID filtering +----------------- + +The set_event_pid file in the same directory as the top events directory +exists, will filter all events from tracing any task that does not have the +PID listed in the set_event_pid file. +:: + + # cd /sys/kernel/debug/tracing + # echo $$ > set_event_pid + # echo 1 > events/enable + +Will only trace events for the current task. + +To add more PIDs without losing the PIDs already included, use '>>'. +:: + + # echo 123 244 1 >> set_event_pid + + +6. Event triggers +================= + +Trace events can be made to conditionally invoke trigger 'commands' +which can take various forms and are described in detail below; +examples would be enabling or disabling other trace events or invoking +a stack trace whenever the trace event is hit. Whenever a trace event +with attached triggers is invoked, the set of trigger commands +associated with that event is invoked. Any given trigger can +additionally have an event filter of the same form as described in +section 5 (Event filtering) associated with it - the command will only +be invoked if the event being invoked passes the associated filter. +If no filter is associated with the trigger, it always passes. + +Triggers are added to and removed from a particular event by writing +trigger expressions to the 'trigger' file for the given event. + +A given event can have any number of triggers associated with it, +subject to any restrictions that individual commands may have in that +regard. + +Event triggers are implemented on top of "soft" mode, which means that +whenever a trace event has one or more triggers associated with it, +the event is activated even if it isn't actually enabled, but is +disabled in a "soft" mode. That is, the tracepoint will be called, +but just will not be traced, unless of course it's actually enabled. +This scheme allows triggers to be invoked even for events that aren't +enabled, and also allows the current event filter implementation to be +used for conditionally invoking triggers. + +The syntax for event triggers is roughly based on the syntax for +set_ftrace_filter 'ftrace filter commands' (see the 'Filter commands' +section of Documentation/trace/ftrace.txt), but there are major +differences and the implementation isn't currently tied to it in any +way, so beware about making generalizations between the two. + +6.1 Expression syntax +--------------------- + +Triggers are added by echoing the command to the 'trigger' file:: + + # echo 'command[:count] [if filter]' > trigger + +Triggers are removed by echoing the same command but starting with '!' +to the 'trigger' file:: + + # echo '!command[:count] [if filter]' > trigger + +The [if filter] part isn't used in matching commands when removing, so +leaving that off in a '!' command will accomplish the same thing as +having it in. + +The filter syntax is the same as that described in the 'Event +filtering' section above. + +For ease of use, writing to the trigger file using '>' currently just +adds or removes a single trigger and there's no explicit '>>' support +('>' actually behaves like '>>') or truncation support to remove all +triggers (you have to use '!' for each one added.) + +6.2 Supported trigger commands +------------------------------ + +The following commands are supported: + +- enable_event/disable_event + + These commands can enable or disable another trace event whenever + the triggering event is hit. When these commands are registered, + the other trace event is activated, but disabled in a "soft" mode. + That is, the tracepoint will be called, but just will not be traced. + The event tracepoint stays in this mode as long as there's a trigger + in effect that can trigger it. + + For example, the following trigger causes kmalloc events to be + traced when a read system call is entered, and the :1 at the end + specifies that this enablement happens only once:: + + # echo 'enable_event:kmem:kmalloc:1' > \ + /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger + + The following trigger causes kmalloc events to stop being traced + when a read system call exits. This disablement happens on every + read system call exit:: + + # echo 'disable_event:kmem:kmalloc' > \ + /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger + + The format is:: + + enable_event::[:count] + disable_event::[:count] + + To remove the above commands:: + + # echo '!enable_event:kmem:kmalloc:1' > \ + /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger + + # echo '!disable_event:kmem:kmalloc' > \ + /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger + + Note that there can be any number of enable/disable_event triggers + per triggering event, but there can only be one trigger per + triggered event. e.g. sys_enter_read can have triggers enabling both + kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc + versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if + bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they + could be combined into a single filter on kmem:kmalloc though). + +- stacktrace + + This command dumps a stacktrace in the trace buffer whenever the + triggering event occurs. + + For example, the following trigger dumps a stacktrace every time the + kmalloc tracepoint is hit:: + + # echo 'stacktrace' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + The following trigger dumps a stacktrace the first 5 times a kmalloc + request happens with a size >= 64K:: + + # echo 'stacktrace:5 if bytes_req >= 65536' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + The format is:: + + stacktrace[:count] + + To remove the above commands:: + + # echo '!stacktrace' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # echo '!stacktrace:5 if bytes_req >= 65536' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + The latter can also be removed more simply by the following (without + the filter):: + + # echo '!stacktrace:5' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + Note that there can be only one stacktrace trigger per triggering + event. + +- snapshot + + This command causes a snapshot to be triggered whenever the + triggering event occurs. + + The following command creates a snapshot every time a block request + queue is unplugged with a depth > 1. If you were tracing a set of + events or functions at the time, the snapshot trace buffer would + capture those events when the trigger event occurred:: + + # echo 'snapshot if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + To only snapshot once:: + + # echo 'snapshot:1 if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + To remove the above commands:: + + # echo '!snapshot if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + # echo '!snapshot:1 if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + Note that there can be only one snapshot trigger per triggering + event. + +- traceon/traceoff + + These commands turn tracing on and off when the specified events are + hit. The parameter determines how many times the tracing system is + turned on and off. If unspecified, there is no limit. + + The following command turns tracing off the first time a block + request queue is unplugged with a depth > 1. If you were tracing a + set of events or functions at the time, you could then examine the + trace buffer to see the sequence of events that led up to the + trigger event:: + + # echo 'traceoff:1 if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + To always disable tracing when nr_rq > 1:: + + # echo 'traceoff if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + To remove the above commands:: + + # echo '!traceoff:1 if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + # echo '!traceoff if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + Note that there can be only one traceon or traceoff trigger per + triggering event. + +- hist + + This command aggregates event hits into a hash table keyed on one or + more trace event format fields (or stacktrace) and a set of running + totals derived from one or more trace event format fields and/or + event counts (hitcount). + + The format of a hist trigger is as follows:: + + hist:keys=[:values=] + [:sort=][:size=#entries][:pause][:continue] + [:clear][:name=histname1] [if ] + + When a matching event is hit, an entry is added to a hash table + using the key(s) and value(s) named. Keys and values correspond to + fields in the event's format description. Values must correspond to + numeric fields - on an event hit, the value(s) will be added to a + sum kept for that field. The special string 'hitcount' can be used + in place of an explicit value field - this is simply a count of + event hits. If 'values' isn't specified, an implicit 'hitcount' + value will be automatically created and used as the only value. + Keys can be any field, or the special string 'stacktrace', which + will use the event's kernel stacktrace as the key. The keywords + 'keys' or 'key' can be used to specify keys, and the keywords + 'values', 'vals', or 'val' can be used to specify values. Compound + keys consisting of up to two fields can be specified by the 'keys' + keyword. Hashing a compound key produces a unique entry in the + table for each unique combination of component keys, and can be + useful for providing more fine-grained summaries of event data. + Additionally, sort keys consisting of up to two fields can be + specified by the 'sort' keyword. If more than one field is + specified, the result will be a 'sort within a sort': the first key + is taken to be the primary sort key and the second the secondary + key. If a hist trigger is given a name using the 'name' parameter, + its histogram data will be shared with other triggers of the same + name, and trigger hits will update this common data. Only triggers + with 'compatible' fields can be combined in this way; triggers are + 'compatible' if the fields named in the trigger share the same + number and type of fields and those fields also have the same names. + Note that any two events always share the compatible 'hitcount' and + 'stacktrace' fields and can therefore be combined using those + fields, however pointless that may be. + + 'hist' triggers add a 'hist' file to each event's subdirectory. + Reading the 'hist' file for the event will dump the hash table in + its entirety to stdout. If there are multiple hist triggers + attached to an event, there will be a table for each trigger in the + output. The table displayed for a named trigger will be the same as + any other instance having the same name. Each printed hash table + entry is a simple list of the keys and values comprising the entry; + keys are printed first and are delineated by curly braces, and are + followed by the set of value fields for the entry. By default, + numeric fields are displayed as base-10 integers. This can be + modified by appending any of the following modifiers to the field + name: + + - .hex display a number as a hex value + - .sym display an address as a symbol + - .sym-offset display an address as a symbol and offset + - .syscall display a syscall id as a system call name + - .execname display a common_pid as a program name + + Note that in general the semantics of a given field aren't + interpreted when applying a modifier to it, but there are some + restrictions to be aware of in this regard: + + - only the 'hex' modifier can be used for values (because values + are essentially sums, and the other modifiers don't make sense + in that context). + - the 'execname' modifier can only be used on a 'common_pid'. The + reason for this is that the execname is simply the 'comm' value + saved for the 'current' process when an event was triggered, + which is the same as the common_pid value saved by the event + tracing code. Trying to apply that comm value to other pid + values wouldn't be correct, and typically events that care save + pid-specific comm fields in the event itself. + + A typical usage scenario would be the following to enable a hist + trigger, read its current contents, and then turn it off:: + + # echo 'hist:keys=skbaddr.hex:vals=len' > \ + /sys/kernel/debug/tracing/events/net/netif_rx/trigger + + # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist + + # echo '!hist:keys=skbaddr.hex:vals=len' > \ + /sys/kernel/debug/tracing/events/net/netif_rx/trigger + + The trigger file itself can be read to show the details of the + currently attached hist trigger. This information is also displayed + at the top of the 'hist' file when read. + + By default, the size of the hash table is 2048 entries. The 'size' + parameter can be used to specify more or fewer than that. The units + are in terms of hashtable entries - if a run uses more entries than + specified, the results will show the number of 'drops', the number + of hits that were ignored. The size should be a power of 2 between + 128 and 131072 (any non- power-of-2 number specified will be rounded + up). + + The 'sort' parameter can be used to specify a value field to sort + on. The default if unspecified is 'hitcount' and the default sort + order is 'ascending'. To sort in the opposite direction, append + .descending' to the sort key. + + The 'pause' parameter can be used to pause an existing hist trigger + or to start a hist trigger but not log any events until told to do + so. 'continue' or 'cont' can be used to start or restart a paused + hist trigger. + + The 'clear' parameter will clear the contents of a running hist + trigger and leave its current paused/active state. + + Note that the 'pause', 'cont', and 'clear' parameters should be + applied using 'append' shell operator ('>>') if applied to an + existing trigger, rather than via the '>' operator, which will cause + the trigger to be removed through truncation. + +- enable_hist/disable_hist + + The enable_hist and disable_hist triggers can be used to have one + event conditionally start and stop another event's already-attached + hist trigger. Any number of enable_hist and disable_hist triggers + can be attached to a given event, allowing that event to kick off + and stop aggregations on a host of other events. + + The format is very similar to the enable/disable_event triggers:: + + enable_hist::[:count] + disable_hist::[:count] + + Instead of enabling or disabling the tracing of the target event + into the trace buffer as the enable/disable_event triggers do, the + enable/disable_hist triggers enable or disable the aggregation of + the target event into a hash table. + + A typical usage scenario for the enable_hist/disable_hist triggers + would be to first set up a paused hist trigger on some event, + followed by an enable_hist/disable_hist pair that turns the hist + aggregation on and off when conditions of interest are hit:: + + # echo 'hist:keys=skbaddr.hex:vals=len:pause' > \ + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + + # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger + + # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger + + The above sets up an initially paused hist trigger which is unpaused + and starts aggregating events when a given program is executed, and + which stops aggregating when the process exits and the hist trigger + is paused again. + + The examples below provide a more concrete illustration of the + concepts and typical usage patterns discussed above. + + +6.2 'hist' trigger examples +--------------------------- + + The first set of examples creates aggregations using the kmalloc + event. The fields that can be used for the hist trigger are listed + in the kmalloc event's format file:: + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/format + name: kmalloc + ID: 374 + format: + field:unsigned short common_type; offset:0; size:2; signed:0; + field:unsigned char common_flags; offset:2; size:1; signed:0; + field:unsigned char common_preempt_count; offset:3; size:1; signed:0; + field:int common_pid; offset:4; size:4; signed:1; + + field:unsigned long call_site; offset:8; size:8; signed:0; + field:const void * ptr; offset:16; size:8; signed:0; + field:size_t bytes_req; offset:24; size:8; signed:0; + field:size_t bytes_alloc; offset:32; size:8; signed:0; + field:gfp_t gfp_flags; offset:40; size:4; signed:0; + + We'll start by creating a hist trigger that generates a simple table + that lists the total number of bytes requested for each function in + the kernel that made one or more calls to kmalloc:: + + # echo 'hist:key=call_site:val=bytes_req' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + This tells the tracing system to create a 'hist' trigger using the + call_site field of the kmalloc event as the key for the table, which + just means that each unique call_site address will have an entry + created for it in the table. The 'val=bytes_req' parameter tells + the hist trigger that for each unique entry (call_site) in the + table, it should keep a running total of the number of bytes + requested by that call_site. + + We'll let it run for awhile and then dump the contents of the 'hist' + file in the kmalloc event's subdirectory (for readability, a number + of entries have been omitted):: + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active] + + { call_site: 18446744072106379007 } hitcount: 1 bytes_req: 176 + { call_site: 18446744071579557049 } hitcount: 1 bytes_req: 1024 + { call_site: 18446744071580608289 } hitcount: 1 bytes_req: 16384 + { call_site: 18446744071581827654 } hitcount: 1 bytes_req: 24 + { call_site: 18446744071580700980 } hitcount: 1 bytes_req: 8 + { call_site: 18446744071579359876 } hitcount: 1 bytes_req: 152 + { call_site: 18446744071580795365 } hitcount: 3 bytes_req: 144 + { call_site: 18446744071581303129 } hitcount: 3 bytes_req: 144 + { call_site: 18446744071580713234 } hitcount: 4 bytes_req: 2560 + { call_site: 18446744071580933750 } hitcount: 4 bytes_req: 736 + . + . + . + { call_site: 18446744072106047046 } hitcount: 69 bytes_req: 5576 + { call_site: 18446744071582116407 } hitcount: 73 bytes_req: 2336 + { call_site: 18446744072106054684 } hitcount: 136 bytes_req: 140504 + { call_site: 18446744072106224230 } hitcount: 136 bytes_req: 19584 + { call_site: 18446744072106078074 } hitcount: 153 bytes_req: 2448 + { call_site: 18446744072106062406 } hitcount: 153 bytes_req: 36720 + { call_site: 18446744071582507929 } hitcount: 153 bytes_req: 37088 + { call_site: 18446744072102520590 } hitcount: 273 bytes_req: 10920 + { call_site: 18446744071582143559 } hitcount: 358 bytes_req: 716 + { call_site: 18446744072106465852 } hitcount: 417 bytes_req: 56712 + { call_site: 18446744072102523378 } hitcount: 485 bytes_req: 27160 + { call_site: 18446744072099568646 } hitcount: 1676 bytes_req: 33520 + + Totals: + Hits: 4610 + Entries: 45 + Dropped: 0 + + The output displays a line for each entry, beginning with the key + specified in the trigger, followed by the value(s) also specified in + the trigger. At the beginning of the output is a line that displays + the trigger info, which can also be displayed by reading the + 'trigger' file:: + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active] + + At the end of the output are a few lines that display the overall + totals for the run. The 'Hits' field shows the total number of + times the event trigger was hit, the 'Entries' field shows the total + number of used entries in the hash table, and the 'Dropped' field + shows the number of hits that were dropped because the number of + used entries for the run exceeded the maximum number of entries + allowed for the table (normally 0, but if not a hint that you may + want to increase the size of the table using the 'size' parameter). + + Notice in the above output that there's an extra field, 'hitcount', + which wasn't specified in the trigger. Also notice that in the + trigger info output, there's a parameter, 'sort=hitcount', which + wasn't specified in the trigger either. The reason for that is that + every trigger implicitly keeps a count of the total number of hits + attributed to a given entry, called the 'hitcount'. That hitcount + information is explicitly displayed in the output, and in the + absence of a user-specified sort parameter, is used as the default + sort field. + + The value 'hitcount' can be used in place of an explicit value in + the 'values' parameter if you don't really need to have any + particular field summed and are mainly interested in hit + frequencies. + + To turn the hist trigger off, simply call up the trigger in the + command history and re-execute it with a '!' prepended:: + + # echo '!hist:key=call_site:val=bytes_req' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + Finally, notice that the call_site as displayed in the output above + isn't really very useful. It's an address, but normally addresses + are displayed in hex. To have a numeric field displayed as a hex + value, simply append '.hex' to the field name in the trigger:: + + # echo 'hist:key=call_site.hex:val=bytes_req' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=call_site.hex:vals=bytes_req:sort=hitcount:size=2048 [active] + + { call_site: ffffffffa026b291 } hitcount: 1 bytes_req: 433 + { call_site: ffffffffa07186ff } hitcount: 1 bytes_req: 176 + { call_site: ffffffff811ae721 } hitcount: 1 bytes_req: 16384 + { call_site: ffffffff811c5134 } hitcount: 1 bytes_req: 8 + { call_site: ffffffffa04a9ebb } hitcount: 1 bytes_req: 511 + { call_site: ffffffff8122e0a6 } hitcount: 1 bytes_req: 12 + { call_site: ffffffff8107da84 } hitcount: 1 bytes_req: 152 + { call_site: ffffffff812d8246 } hitcount: 1 bytes_req: 24 + { call_site: ffffffff811dc1e5 } hitcount: 3 bytes_req: 144 + { call_site: ffffffffa02515e8 } hitcount: 3 bytes_req: 648 + { call_site: ffffffff81258159 } hitcount: 3 bytes_req: 144 + { call_site: ffffffff811c80f4 } hitcount: 4 bytes_req: 544 + . + . + . + { call_site: ffffffffa06c7646 } hitcount: 106 bytes_req: 8024 + { call_site: ffffffffa06cb246 } hitcount: 132 bytes_req: 31680 + { call_site: ffffffffa06cef7a } hitcount: 132 bytes_req: 2112 + { call_site: ffffffff8137e399 } hitcount: 132 bytes_req: 23232 + { call_site: ffffffffa06c941c } hitcount: 185 bytes_req: 171360 + { call_site: ffffffffa06f2a66 } hitcount: 185 bytes_req: 26640 + { call_site: ffffffffa036a70e } hitcount: 265 bytes_req: 10600 + { call_site: ffffffff81325447 } hitcount: 292 bytes_req: 584 + { call_site: ffffffffa072da3c } hitcount: 446 bytes_req: 60656 + { call_site: ffffffffa036b1f2 } hitcount: 526 bytes_req: 29456 + { call_site: ffffffffa0099c06 } hitcount: 1780 bytes_req: 35600 + + Totals: + Hits: 4775 + Entries: 46 + Dropped: 0 + + Even that's only marginally more useful - while hex values do look + more like addresses, what users are typically more interested in + when looking at text addresses are the corresponding symbols + instead. To have an address displayed as symbolic value instead, + simply append '.sym' or '.sym-offset' to the field name in the + trigger:: + + # echo 'hist:key=call_site.sym:val=bytes_req' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=hitcount:size=2048 [active] + + { call_site: [ffffffff810adcb9] syslog_print_all } hitcount: 1 bytes_req: 1024 + { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8 + { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7 + { call_site: [ffffffff8154acbe] usb_alloc_urb } hitcount: 1 bytes_req: 192 + { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7 + { call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40 + { call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 + { call_site: [ffffffff811febd5] fsnotify_alloc_group } hitcount: 2 bytes_req: 528 + { call_site: [ffffffff81440f58] __tty_buffer_request_room } hitcount: 2 bytes_req: 2624 + { call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 2 bytes_req: 96 + { call_site: [ffffffffa05e19af] ieee80211_start_tx_ba_session [mac80211] } hitcount: 2 bytes_req: 464 + { call_site: [ffffffff81672406] tcp_get_metrics } hitcount: 2 bytes_req: 304 + { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 + { call_site: [ffffffff81089b05] sched_create_group } hitcount: 2 bytes_req: 1424 + . + . + . + { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1185 bytes_req: 123240 + { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 1185 bytes_req: 104280 + { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 1402 bytes_req: 190672 + { call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 1518 bytes_req: 146208 + { call_site: [ffffffffa029070e] drm_vma_node_allow [drm] } hitcount: 1746 bytes_req: 69840 + { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 2021 bytes_req: 792312 + { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 2592 bytes_req: 145152 + { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2629 bytes_req: 378576 + { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2629 bytes_req: 3783248 + { call_site: [ffffffff81325607] apparmor_file_alloc_security } hitcount: 5192 bytes_req: 10384 + { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 5529 bytes_req: 110584 + { call_site: [ffffffff8131ebf7] aa_alloc_task_context } hitcount: 21943 bytes_req: 702176 + { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 55759 bytes_req: 5074265 + + Totals: + Hits: 109928 + Entries: 71 + Dropped: 0 + + Because the default sort key above is 'hitcount', the above shows a + the list of call_sites by increasing hitcount, so that at the bottom + we see the functions that made the most kmalloc calls during the + run. If instead we we wanted to see the top kmalloc callers in + terms of the number of bytes requested rather than the number of + calls, and we wanted the top caller to appear at the top, we can use + the 'sort' parameter, along with the 'descending' modifier:: + + # echo 'hist:key=call_site.sym:val=bytes_req:sort=bytes_req.descending' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=bytes_req.descending:size=2048 [active] + + { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2186 bytes_req: 3397464 + { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1790 bytes_req: 712176 + { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 8132 bytes_req: 513135 + { call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 106 bytes_req: 440128 + { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2186 bytes_req: 314784 + { call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 2174 bytes_req: 208992 + { call_site: [ffffffff811ae8e1] __kmalloc } hitcount: 8 bytes_req: 131072 + { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 859 bytes_req: 116824 + { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 1834 bytes_req: 102704 + { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 972 bytes_req: 101088 + { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 972 bytes_req: 85536 + { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 3333 bytes_req: 66664 + { call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 209 bytes_req: 61632 + . + . + . + { call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 + { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 + { call_site: [ffffffff812d8406] copy_semundo } hitcount: 2 bytes_req: 48 + { call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 1 bytes_req: 48 + { call_site: [ffffffffa027121a] drm_getmagic [drm] } hitcount: 1 bytes_req: 48 + { call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40 + { call_site: [ffffffff811c52f4] bprm_change_interp } hitcount: 2 bytes_req: 16 + { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8 + { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7 + { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7 + + Totals: + Hits: 32133 + Entries: 81 + Dropped: 0 + + To display the offset and size information in addition to the symbol + name, just use 'sym-offset' instead:: + + # echo 'hist:key=call_site.sym-offset:val=bytes_req:sort=bytes_req.descending' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=call_site.sym-offset:vals=bytes_req:sort=bytes_req.descending:size=2048 [active] + + { call_site: [ffffffffa046041c] i915_gem_execbuffer2+0x6c/0x2c0 [i915] } hitcount: 4569 bytes_req: 3163720 + { call_site: [ffffffffa0489a66] intel_ring_begin+0xc6/0x1f0 [i915] } hitcount: 4569 bytes_req: 657936 + { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23+0x694/0x1020 [i915] } hitcount: 1519 bytes_req: 472936 + { call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23+0x516/0x1020 [i915] } hitcount: 3050 bytes_req: 211832 + { call_site: [ffffffff811e2a1b] seq_buf_alloc+0x1b/0x50 } hitcount: 34 bytes_req: 148384 + { call_site: [ffffffffa04a580c] intel_crtc_page_flip+0xbc/0x870 [i915] } hitcount: 1385 bytes_req: 144040 + { call_site: [ffffffff811ae8e1] __kmalloc+0x191/0x1b0 } hitcount: 8 bytes_req: 131072 + { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl+0x282/0x360 [drm] } hitcount: 1385 bytes_req: 121880 + { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc+0x32/0x100 [drm] } hitcount: 1848 bytes_req: 103488 + { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state+0x2c/0xa0 [i915] } hitcount: 461 bytes_req: 62696 + { call_site: [ffffffffa029070e] drm_vma_node_allow+0x2e/0xd0 [drm] } hitcount: 1541 bytes_req: 61640 + { call_site: [ffffffff815f8d7b] sk_prot_alloc+0xcb/0x1b0 } hitcount: 57 bytes_req: 57456 + . + . + . + { call_site: [ffffffff8109524a] alloc_fair_sched_group+0x5a/0x1a0 } hitcount: 2 bytes_req: 128 + { call_site: [ffffffffa027b921] drm_vm_open_locked+0x31/0xa0 [drm] } hitcount: 3 bytes_req: 96 + { call_site: [ffffffff8122e266] proc_self_follow_link+0x76/0xb0 } hitcount: 8 bytes_req: 96 + { call_site: [ffffffff81213e80] load_elf_binary+0x240/0x1650 } hitcount: 3 bytes_req: 84 + { call_site: [ffffffff8154bc62] usb_control_msg+0x42/0x110 } hitcount: 1 bytes_req: 8 + { call_site: [ffffffffa00bf6fe] hidraw_send_report+0x7e/0x1a0 [hid] } hitcount: 1 bytes_req: 7 + { call_site: [ffffffffa00bf1ca] hidraw_report_event+0x8a/0x120 [hid] } hitcount: 1 bytes_req: 7 + + Totals: + Hits: 26098 + Entries: 64 + Dropped: 0 + + We can also add multiple fields to the 'values' parameter. For + example, we might want to see the total number of bytes allocated + alongside bytes requested, and display the result sorted by bytes + allocated in a descending order:: + + # echo 'hist:keys=call_site.sym:values=bytes_req,bytes_alloc:sort=bytes_alloc.descending' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=call_site.sym:vals=bytes_req,bytes_alloc:sort=bytes_alloc.descending:size=2048 [active] + + { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 7403 bytes_req: 4084360 bytes_alloc: 5958016 + { call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 541 bytes_req: 2213968 bytes_alloc: 2228224 + { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 7404 bytes_req: 1066176 bytes_alloc: 1421568 + { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1565 bytes_req: 557368 bytes_alloc: 1037760 + { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 9557 bytes_req: 595778 bytes_alloc: 695744 + { call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 5839 bytes_req: 430680 bytes_alloc: 470400 + { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 2388 bytes_req: 324768 bytes_alloc: 458496 + { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 3911 bytes_req: 219016 bytes_alloc: 250304 + { call_site: [ffffffff815f8d7b] sk_prot_alloc } hitcount: 235 bytes_req: 236880 bytes_alloc: 240640 + { call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 557 bytes_req: 169024 bytes_alloc: 221760 + { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 9378 bytes_req: 187548 bytes_alloc: 206312 + { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1519 bytes_req: 157976 bytes_alloc: 194432 + . + . + . + { call_site: [ffffffff8109bd3b] sched_autogroup_create_attach } hitcount: 2 bytes_req: 144 bytes_alloc: 192 + { call_site: [ffffffff81097ee8] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128 + { call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128 + { call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128 + { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128 + { call_site: [ffffffff81213e80] load_elf_binary } hitcount: 3 bytes_req: 84 bytes_alloc: 96 + { call_site: [ffffffff81079a2e] kthread_create_on_node } hitcount: 1 bytes_req: 56 bytes_alloc: 64 + { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8 + { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8 bytes_alloc: 8 + { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8 + + Totals: + Hits: 66598 + Entries: 65 + Dropped: 0 + + Finally, to finish off our kmalloc example, instead of simply having + the hist trigger display symbolic call_sites, we can have the hist + trigger additionally display the complete set of kernel stack traces + that led to each call_site. To do that, we simply use the special + value 'stacktrace' for the key parameter:: + + # echo 'hist:keys=stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + The above trigger will use the kernel stack trace in effect when an + event is triggered as the key for the hash table. This allows the + enumeration of every kernel callpath that led up to a particular + event, along with a running total of any of the event fields for + that event. Here we tally bytes requested and bytes allocated for + every callpath in the system that led up to a kmalloc (in this case + every callpath to a kmalloc for a kernel compile):: + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + # trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active] + + { stacktrace: + __kmalloc_track_caller+0x10b/0x1a0 + kmemdup+0x20/0x50 + hidraw_report_event+0x8a/0x120 [hid] + hid_report_raw_event+0x3ea/0x440 [hid] + hid_input_report+0x112/0x190 [hid] + hid_irq_in+0xc2/0x260 [usbhid] + __usb_hcd_giveback_urb+0x72/0x120 + usb_giveback_urb_bh+0x9e/0xe0 + tasklet_hi_action+0xf8/0x100 + __do_softirq+0x114/0x2c0 + irq_exit+0xa5/0xb0 + do_IRQ+0x5a/0xf0 + ret_from_intr+0x0/0x30 + cpuidle_enter+0x17/0x20 + cpu_startup_entry+0x315/0x3e0 + rest_init+0x7c/0x80 + } hitcount: 3 bytes_req: 21 bytes_alloc: 24 + { stacktrace: + __kmalloc_track_caller+0x10b/0x1a0 + kmemdup+0x20/0x50 + hidraw_report_event+0x8a/0x120 [hid] + hid_report_raw_event+0x3ea/0x440 [hid] + hid_input_report+0x112/0x190 [hid] + hid_irq_in+0xc2/0x260 [usbhid] + __usb_hcd_giveback_urb+0x72/0x120 + usb_giveback_urb_bh+0x9e/0xe0 + tasklet_hi_action+0xf8/0x100 + __do_softirq+0x114/0x2c0 + irq_exit+0xa5/0xb0 + do_IRQ+0x5a/0xf0 + ret_from_intr+0x0/0x30 + } hitcount: 3 bytes_req: 21 bytes_alloc: 24 + { stacktrace: + kmem_cache_alloc_trace+0xeb/0x150 + aa_alloc_task_context+0x27/0x40 + apparmor_cred_prepare+0x1f/0x50 + security_prepare_creds+0x16/0x20 + prepare_creds+0xdf/0x1a0 + SyS_capset+0xb5/0x200 + system_call_fastpath+0x12/0x6a + } hitcount: 1 bytes_req: 32 bytes_alloc: 32 + . + . + . + { stacktrace: + __kmalloc+0x11b/0x1b0 + i915_gem_execbuffer2+0x6c/0x2c0 [i915] + drm_ioctl+0x349/0x670 [drm] + do_vfs_ioctl+0x2f0/0x4f0 + SyS_ioctl+0x81/0xa0 + system_call_fastpath+0x12/0x6a + } hitcount: 17726 bytes_req: 13944120 bytes_alloc: 19593808 + { stacktrace: + __kmalloc+0x11b/0x1b0 + load_elf_phdrs+0x76/0xa0 + load_elf_binary+0x102/0x1650 + search_binary_handler+0x97/0x1d0 + do_execveat_common.isra.34+0x551/0x6e0 + SyS_execve+0x3a/0x50 + return_from_execve+0x0/0x23 + } hitcount: 33348 bytes_req: 17152128 bytes_alloc: 20226048 + { stacktrace: + kmem_cache_alloc_trace+0xeb/0x150 + apparmor_file_alloc_security+0x27/0x40 + security_file_alloc+0x16/0x20 + get_empty_filp+0x93/0x1c0 + path_openat+0x31/0x5f0 + do_filp_open+0x3a/0x90 + do_sys_open+0x128/0x220 + SyS_open+0x1e/0x20 + system_call_fastpath+0x12/0x6a + } hitcount: 4766422 bytes_req: 9532844 bytes_alloc: 38131376 + { stacktrace: + __kmalloc+0x11b/0x1b0 + seq_buf_alloc+0x1b/0x50 + seq_read+0x2cc/0x370 + proc_reg_read+0x3d/0x80 + __vfs_read+0x28/0xe0 + vfs_read+0x86/0x140 + SyS_read+0x46/0xb0 + system_call_fastpath+0x12/0x6a + } hitcount: 19133 bytes_req: 78368768 bytes_alloc: 78368768 + + Totals: + Hits: 6085872 + Entries: 253 + Dropped: 0 + + If you key a hist trigger on common_pid, in order for example to + gather and display sorted totals for each process, you can use the + special .execname modifier to display the executable names for the + processes in the table rather than raw pids. The example below + keeps a per-process sum of total bytes read:: + + # echo 'hist:key=common_pid.execname:val=count:sort=count.descending' > \ + /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger + + # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/hist + # trigger info: hist:keys=common_pid.execname:vals=count:sort=count.descending:size=2048 [active] + + { common_pid: gnome-terminal [ 3196] } hitcount: 280 count: 1093512 + { common_pid: Xorg [ 1309] } hitcount: 525 count: 256640 + { common_pid: compiz [ 2889] } hitcount: 59 count: 254400 + { common_pid: bash [ 8710] } hitcount: 3 count: 66369 + { common_pid: dbus-daemon-lau [ 8703] } hitcount: 49 count: 47739 + { common_pid: irqbalance [ 1252] } hitcount: 27 count: 27648 + { common_pid: 01ifupdown [ 8705] } hitcount: 3 count: 17216 + { common_pid: dbus-daemon [ 772] } hitcount: 10 count: 12396 + { common_pid: Socket Thread [ 8342] } hitcount: 11 count: 11264 + { common_pid: nm-dhcp-client. [ 8701] } hitcount: 6 count: 7424 + { common_pid: gmain [ 1315] } hitcount: 18 count: 6336 + . + . + . + { common_pid: postgres [ 1892] } hitcount: 2 count: 32 + { common_pid: postgres [ 1891] } hitcount: 2 count: 32 + { common_pid: gmain [ 8704] } hitcount: 2 count: 32 + { common_pid: upstart-dbus-br [ 2740] } hitcount: 21 count: 21 + { common_pid: nm-dispatcher.a [ 8696] } hitcount: 1 count: 16 + { common_pid: indicator-datet [ 2904] } hitcount: 1 count: 16 + { common_pid: gdbus [ 2998] } hitcount: 1 count: 16 + { common_pid: rtkit-daemon [ 2052] } hitcount: 1 count: 8 + { common_pid: init [ 1] } hitcount: 2 count: 2 + + Totals: + Hits: 2116 + Entries: 51 + Dropped: 0 + + Similarly, if you key a hist trigger on syscall id, for example to + gather and display a list of systemwide syscall hits, you can use + the special .syscall modifier to display the syscall names rather + than raw ids. The example below keeps a running total of syscall + counts for the system during the run:: + + # echo 'hist:key=id.syscall:val=hitcount' > \ + /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger + + # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist + # trigger info: hist:keys=id.syscall:vals=hitcount:sort=hitcount:size=2048 [active] + + { id: sys_fsync [ 74] } hitcount: 1 + { id: sys_newuname [ 63] } hitcount: 1 + { id: sys_prctl [157] } hitcount: 1 + { id: sys_statfs [137] } hitcount: 1 + { id: sys_symlink [ 88] } hitcount: 1 + { id: sys_sendmmsg [307] } hitcount: 1 + { id: sys_semctl [ 66] } hitcount: 1 + { id: sys_readlink [ 89] } hitcount: 3 + { id: sys_bind [ 49] } hitcount: 3 + { id: sys_getsockname [ 51] } hitcount: 3 + { id: sys_unlink [ 87] } hitcount: 3 + { id: sys_rename [ 82] } hitcount: 4 + { id: unknown_syscall [ 58] } hitcount: 4 + { id: sys_connect [ 42] } hitcount: 4 + { id: sys_getpid [ 39] } hitcount: 4 + . + . + . + { id: sys_rt_sigprocmask [ 14] } hitcount: 952 + { id: sys_futex [202] } hitcount: 1534 + { id: sys_write [ 1] } hitcount: 2689 + { id: sys_setitimer [ 38] } hitcount: 2797 + { id: sys_read [ 0] } hitcount: 3202 + { id: sys_select [ 23] } hitcount: 3773 + { id: sys_writev [ 20] } hitcount: 4531 + { id: sys_poll [ 7] } hitcount: 8314 + { id: sys_recvmsg [ 47] } hitcount: 13738 + { id: sys_ioctl [ 16] } hitcount: 21843 + + Totals: + Hits: 67612 + Entries: 72 + Dropped: 0 + + The syscall counts above provide a rough overall picture of system + call activity on the system; we can see for example that the most + popular system call on this system was the 'sys_ioctl' system call. + + We can use 'compound' keys to refine that number and provide some + further insight as to which processes exactly contribute to the + overall ioctl count. + + The command below keeps a hitcount for every unique combination of + system call id and pid - the end result is essentially a table + that keeps a per-pid sum of system call hits. The results are + sorted using the system call id as the primary key, and the + hitcount sum as the secondary key:: + + # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount' > \ + /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger + + # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist + # trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 [active] + + { id: sys_read [ 0], common_pid: rtkit-daemon [ 1877] } hitcount: 1 + { id: sys_read [ 0], common_pid: gdbus [ 2976] } hitcount: 1 + { id: sys_read [ 0], common_pid: console-kit-dae [ 3400] } hitcount: 1 + { id: sys_read [ 0], common_pid: postgres [ 1865] } hitcount: 1 + { id: sys_read [ 0], common_pid: deja-dup-monito [ 3543] } hitcount: 2 + { id: sys_read [ 0], common_pid: NetworkManager [ 890] } hitcount: 2 + { id: sys_read [ 0], common_pid: evolution-calen [ 3048] } hitcount: 2 + { id: sys_read [ 0], common_pid: postgres [ 1864] } hitcount: 2 + { id: sys_read [ 0], common_pid: nm-applet [ 3022] } hitcount: 2 + { id: sys_read [ 0], common_pid: whoopsie [ 1212] } hitcount: 2 + . + . + . + { id: sys_ioctl [ 16], common_pid: bash [ 8479] } hitcount: 1 + { id: sys_ioctl [ 16], common_pid: bash [ 3472] } hitcount: 12 + { id: sys_ioctl [ 16], common_pid: gnome-terminal [ 3199] } hitcount: 16 + { id: sys_ioctl [ 16], common_pid: Xorg [ 1267] } hitcount: 1808 + { id: sys_ioctl [ 16], common_pid: compiz [ 2994] } hitcount: 5580 + . + . + . + { id: sys_waitid [247], common_pid: upstart-dbus-br [ 2690] } hitcount: 3 + { id: sys_waitid [247], common_pid: upstart-dbus-br [ 2688] } hitcount: 16 + { id: sys_inotify_add_watch [254], common_pid: gmain [ 975] } hitcount: 2 + { id: sys_inotify_add_watch [254], common_pid: gmain [ 3204] } hitcount: 4 + { id: sys_inotify_add_watch [254], common_pid: gmain [ 2888] } hitcount: 4 + { id: sys_inotify_add_watch [254], common_pid: gmain [ 3003] } hitcount: 4 + { id: sys_inotify_add_watch [254], common_pid: gmain [ 2873] } hitcount: 4 + { id: sys_inotify_add_watch [254], common_pid: gmain [ 3196] } hitcount: 6 + { id: sys_openat [257], common_pid: java [ 2623] } hitcount: 2 + { id: sys_eventfd2 [290], common_pid: ibus-ui-gtk3 [ 2760] } hitcount: 4 + { id: sys_eventfd2 [290], common_pid: compiz [ 2994] } hitcount: 6 + + Totals: + Hits: 31536 + Entries: 323 + Dropped: 0 + + The above list does give us a breakdown of the ioctl syscall by + pid, but it also gives us quite a bit more than that, which we + don't really care about at the moment. Since we know the syscall + id for sys_ioctl (16, displayed next to the sys_ioctl name), we + can use that to filter out all the other syscalls:: + + # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount if id == 16' > \ + /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger + + # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist + # trigger info: hist:keys=id.syscall,common_pid.execname:vals=h