summaryrefslogtreecommitdiffstats
path: root/tools
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2017-07-03 12:40:46 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2017-07-03 12:40:46 -0700
commit7447d56217e215e50317f308aee1ed293ac4f749 (patch)
tree903832ecb206ae83160992c6aec40c624821941f /tools
parent892ad5acca0b2ddb514fae63fa4686bf726d2471 (diff)
parent23acd3e1a0a377cf3730ccb753aa1fdc50378396 (diff)
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar: "Most of the changes are for tooling, the main changes in this cycle were: - Improve Intel-PT hardware tracing support, both on the kernel and on the tooling side: PTWRITE instruction support, power events for C-state tracing, etc. (Adrian Hunter) - Add support to measure SMI cost to the x86 architecture, with tooling support in 'perf stat' (Kan Liang) - Support function filtering in 'perf ftrace', plus related improvements (Namhyung Kim) - Allow adding and removing fields to the default 'perf script' columns, using + or - as field prefixes to do so (Andi Kleen) - Allow resolving the DSO name with 'perf script -F brstack{sym,off},dso' (Mark Santaniello) - Add perf tooling unwind support for PowerPC (Paolo Bonzini) - ... and various other improvements as well" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits) perf auxtrace: Add CPU filter support perf intel-pt: Do not use TSC packets for calculating CPU cycles to TSC perf intel-pt: Update documentation to include new ptwrite and power events perf intel-pt: Add example script for power events and PTWRITE perf intel-pt: Synthesize new power and "ptwrite" events perf intel-pt: Move code in intel_pt_synth_events() to simplify attr setting perf intel-pt: Factor out intel_pt_set_event_name() perf intel-pt: Tidy messages into called function intel_pt_synth_event() perf intel-pt: Tidy Intel PT evsel lookup into separate function perf intel-pt: Join needlessly wrapped lines perf intel-pt: Remove unused instructions_sample_period perf intel-pt: Factor out common code synthesizing event samples perf script: Add synthesized Intel PT power and ptwrite events perf/x86/intel: Constify the 'lbr_desc[]' array and make a function static perf script: Add 'synth' field for synthesized event payloads perf auxtrace: Add itrace option to output power events perf auxtrace: Add itrace option to output ptwrite events tools include: Add byte-swapping macros to kernel.h perf script: Add 'synth' event type for synthesized events x86/insn: perf tools: Add new ptwrite instruction ...
Diffstat (limited to 'tools')
-rw-r--r--tools/include/linux/compiler-gcc.h10
-rw-r--r--tools/include/linux/compiler.h4
-rw-r--r--tools/include/linux/kernel.h35
-rw-r--r--tools/lib/api/fs/fs.c30
-rw-r--r--tools/lib/api/fs/fs.h4
-rw-r--r--tools/objtool/arch/x86/insn/x86-opcode-map.txt2
-rw-r--r--tools/perf/Documentation/intel-pt.txt78
-rw-r--r--tools/perf/Documentation/itrace.txt8
-rw-r--r--tools/perf/Documentation/perf-ftrace.txt33
-rw-r--r--tools/perf/Documentation/perf-script.txt18
-rw-r--r--tools/perf/Documentation/perf-stat.txt14
-rw-r--r--tools/perf/Makefile.config2
-rw-r--r--tools/perf/arch/arm/util/cs-etm.c29
-rw-r--r--tools/perf/arch/powerpc/util/Build2
-rw-r--r--tools/perf/arch/powerpc/util/unwind-libdw.c73
-rw-r--r--tools/perf/arch/x86/tests/insn-x86-dat-32.c12
-rw-r--r--tools/perf/arch/x86/tests/insn-x86-dat-64.c30
-rw-r--r--tools/perf/arch/x86/tests/insn-x86-dat-src.c30
-rw-r--r--tools/perf/arch/x86/util/intel-bts.c4
-rw-r--r--tools/perf/arch/x86/util/intel-pt.c9
-rw-r--r--tools/perf/bench/numa.c2
-rw-r--r--tools/perf/builtin-c2c.c4
-rw-r--r--tools/perf/builtin-config.c67
-rw-r--r--tools/perf/builtin-diff.c5
-rw-r--r--tools/perf/builtin-ftrace.c159
-rw-r--r--tools/perf/builtin-help.c48
-rw-r--r--tools/perf/builtin-kmem.c4
-rw-r--r--tools/perf/builtin-record.c4
-rw-r--r--tools/perf/builtin-report.c8
-rw-r--r--tools/perf/builtin-sched.c2
-rw-r--r--tools/perf/builtin-script.c353
-rw-r--r--tools/perf/builtin-stat.c53
-rw-r--r--tools/perf/builtin-top.c4
-rw-r--r--tools/perf/jvmti/jvmti_agent.c2
-rw-r--r--tools/perf/jvmti/jvmti_agent.h2
-rw-r--r--tools/perf/jvmti/libjvmti.c5
-rw-r--r--tools/perf/pmu-events/jevents.c4
-rw-r--r--tools/perf/scripts/python/bin/intel-pt-events-record13
-rw-r--r--tools/perf/scripts/python/bin/intel-pt-events-report3
-rw-r--r--tools/perf/scripts/python/intel-pt-events.py128
-rw-r--r--tools/perf/tests/attr.c10
-rw-r--r--tools/perf/tests/attr.py48
-rw-r--r--tools/perf/tests/bp_signal.c3
-rw-r--r--tools/perf/tests/bp_signal_overflow.c3
-rw-r--r--tools/perf/tests/bpf-script-test-prologue.c9
-rw-r--r--tools/perf/tests/dwarf-unwind.c15
-rw-r--r--tools/perf/tests/parse-events.c13
-rw-r--r--tools/perf/ui/browsers/annotate.c54
-rw-r--r--tools/perf/ui/gtk/annotate.c3
-rw-r--r--tools/perf/util/annotate.c10
-rw-r--r--tools/perf/util/annotate.h4
-rw-r--r--tools/perf/util/auxtrace.c18
-rw-r--r--tools/perf/util/auxtrace.h6
-rw-r--r--tools/perf/util/cache.h3
-rw-r--r--tools/perf/util/config.c43
-rw-r--r--tools/perf/util/config.h4
-rw-r--r--tools/perf/util/data-convert-bt.c6
-rw-r--r--tools/perf/util/debug.h11
-rw-r--r--tools/perf/util/event.h121
-rw-r--r--tools/perf/util/evlist.h3
-rw-r--r--tools/perf/util/evsel.c42
-rw-r--r--tools/perf/util/genelf_debug.c5
-rw-r--r--tools/perf/util/header.c3
-rw-r--r--tools/perf/util/help-unknown-cmd.c2
-rw-r--r--tools/perf/util/intel-bts.c2
-rw-r--r--tools/perf/util/intel-pt-decoder/intel-pt-decoder.c304
-rw-r--r--tools/perf/util/intel-pt-decoder/intel-pt-decoder.h13
-rw-r--r--tools/perf/util/intel-pt-decoder/intel-pt-log.h4
-rw-r--r--tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c110
-rw-r--r--tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h7
-rw-r--r--tools/perf/util/intel-pt-decoder/x86-opcode-map.txt2
-rw-r--r--tools/perf/util/intel-pt.c642
-rw-r--r--tools/perf/util/pmu.h4
-rw-r--r--tools/perf/util/probe-event.h4
-rw-r--r--tools/perf/util/scripting-engines/trace-event-python.c3
-rw-r--r--tools/perf/util/session.c2
-rw-r--r--tools/perf/util/sort.c22
-rw-r--r--tools/perf/util/stat-shadow.c33
-rw-r--r--tools/perf/util/stat.c2
-rw-r--r--tools/perf/util/stat.h2
-rw-r--r--tools/perf/util/strbuf.h4
-rw-r--r--tools/perf/util/trace-event-parse.c4
-rw-r--r--tools/perf/util/usage.c62
-rw-r--r--tools/perf/util/util.c52
-rw-r--r--tools/perf/util/util.h21
85 files changed, 2428 insertions, 607 deletions
diff --git a/tools/include/linux/compiler-gcc.h b/tools/include/linux/compiler-gcc.h
index 825d44f89a29..bd39b2090ad1 100644
--- a/tools/include/linux/compiler-gcc.h
+++ b/tools/include/linux/compiler-gcc.h
@@ -19,3 +19,13 @@
/* &a[0] degrades to a pointer: a different type from an array */
#define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
+
+#define noinline __attribute__((noinline))
+
+#define __packed __attribute__((packed))
+
+#define __noreturn __attribute__((noreturn))
+
+#define __aligned(x) __attribute__((aligned(x)))
+#define __printf(a, b) __attribute__((format(printf, a, b)))
+#define __scanf(a, b) __attribute__((format(scanf, a, b)))
diff --git a/tools/include/linux/compiler.h b/tools/include/linux/compiler.h
index ef6ab908a42f..d7a5604c38d7 100644
--- a/tools/include/linux/compiler.h
+++ b/tools/include/linux/compiler.h
@@ -17,6 +17,10 @@
# define __always_inline inline __attribute__((always_inline))
#endif
+#ifndef noinline
+#define noinline
+#endif
+
/* Are two types/vars the same type (ignoring qualifiers)? */
#ifndef __same_type
# define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
diff --git a/tools/include/linux/kernel.h b/tools/include/linux/kernel.h
index 801b927499f2..77d2e94ca5df 100644
--- a/tools/include/linux/kernel.h
+++ b/tools/include/linux/kernel.h
@@ -5,6 +5,8 @@
#include <stddef.h>
#include <assert.h>
#include <linux/compiler.h>
+#include <endian.h>
+#include <byteswap.h>
#ifndef UINT_MAX
#define UINT_MAX (~0U)
@@ -68,12 +70,33 @@
#endif
#endif
-/*
- * Both need more care to handle endianness
- * (Don't use bitmap_copy_le() for now)
- */
-#define cpu_to_le64(x) (x)
-#define cpu_to_le32(x) (x)
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define cpu_to_le16 bswap_16
+#define cpu_to_le32 bswap_32
+#define cpu_to_le64 bswap_64
+#define le16_to_cpu bswap_16
+#define le32_to_cpu bswap_32
+#define le64_to_cpu bswap_64
+#define cpu_to_be16
+#define cpu_to_be32
+#define cpu_to_be64
+#define be16_to_cpu
+#define be32_to_cpu
+#define be64_to_cpu
+#else
+#define cpu_to_le16
+#define cpu_to_le32
+#define cpu_to_le64
+#define le16_to_cpu
+#define le32_to_cpu
+#define le64_to_cpu
+#define cpu_to_be16 bswap_16
+#define cpu_to_be32 bswap_32
+#define cpu_to_be64 bswap_64
+#define be16_to_cpu bswap_16
+#define be32_to_cpu bswap_32
+#define be64_to_cpu bswap_64
+#endif
int vscnprintf(char *buf, size_t size, const char *fmt, va_list args);
int scnprintf(char * buf, size_t size, const char * fmt, ...);
diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 809c7721cd24..a7ecf8f469f4 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -387,6 +387,22 @@ int filename__read_str(const char *filename, char **buf, size_t *sizep)
return err;
}
+int filename__write_int(const char *filename, int value)
+{
+ int fd = open(filename, O_WRONLY), err = -1;
+ char buf[64];
+
+ if (fd < 0)
+ return err;
+
+ sprintf(buf, "%d", value);
+ if (write(fd, buf, sizeof(buf)) == sizeof(buf))
+ err = 0;
+
+ close(fd);
+ return err;
+}
+
int procfs__read_str(const char *entry, char **buf, size_t *sizep)
{
char path[PATH_MAX];
@@ -480,3 +496,17 @@ int sysctl__read_int(const char *sysctl, int *value)
return filename__read_int(path, value);
}
+
+int sysfs__write_int(const char *entry, int value)
+{
+ char path[PATH_MAX];
+ const char *sysfs = sysfs__mountpoint();
+
+ if (!sysfs)
+ return -1;
+
+ if (snprintf(path, sizeof(path), "%s/%s", sysfs, entry) >= PATH_MAX)
+ return -1;
+
+ return filename__write_int(path, value);
+}
diff --git a/tools/lib/api/fs/fs.h b/tools/lib/api/fs/fs.h
index 956c21127d1e..45605348461e 100644
--- a/tools/lib/api/fs/fs.h
+++ b/tools/lib/api/fs/fs.h
@@ -31,6 +31,8 @@ int filename__read_int(const char *filename, int *value);
int filename__read_ull(const char *filename, unsigned long long *value);
int filename__read_str(const char *filename, char **buf, size_t *sizep);
+int filename__write_int(const char *filename, int value);
+
int procfs__read_str(const char *entry, char **buf, size_t *sizep);
int sysctl__read_int(const char *sysctl, int *value);
@@ -38,4 +40,6 @@ int sysfs__read_int(const char *entry, int *value);
int sysfs__read_ull(const char *entry, unsigned long long *value);
int sysfs__read_str(const char *entry, char **buf, size_t *sizep);
int sysfs__read_bool(const char *entry, bool *value);
+
+int sysfs__write_int(const char *entry, int value);
#endif /* __API_FS__ */
diff --git a/tools/objtool/arch/x86/insn/x86-opcode-map.txt b/tools/objtool/arch/x86/insn/x86-opcode-map.txt
index 767be7c76034..12e377184ee4 100644
--- a/tools/objtool/arch/x86/insn/x86-opcode-map.txt
+++ b/tools/objtool/arch/x86/insn/x86-opcode-map.txt
@@ -1009,7 +1009,7 @@ GrpTable: Grp15
1: fxstor | RDGSBASE Ry (F3),(11B)
2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
-4: XSAVE
+4: XSAVE | ptwrite Ey (F3),(11B)
5: XRSTOR | lfence (11B)
6: XSAVEOPT | clwb (66) | mfence (11B)
7: clflush | clflushopt (66) | sfence (11B)
diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
index b0b3007d3c9c..4b6cdbf8f935 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -108,6 +108,9 @@ approach is available to export the data to a postgresql database. Refer to
script export-to-postgresql.py for more details, and to script
call-graph-from-postgresql.py for an example of using the database.
+There is also script intel-pt-events.py which provides an example of how to
+unpack the raw data for power events and PTWRITE.
+
As mentioned above, it is easy to capture too much data. One way to limit the
data captured is to use 'snapshot' mode which is explained further below.
Refer to 'new snapshot option' and 'Intel PT modes of operation' further below.
@@ -364,6 +367,42 @@ cyc_thresh Specifies how frequently CYC packets are produced - see cyc
CYC packets are not requested by default.
+pt Specifies pass-through which enables the 'branch' config term.
+
+ The default config selects 'pt' if it is available, so a user will
+ never need to specify this term.
+
+branch Enable branch tracing. Branch tracing is enabled by default so to
+ disable branch tracing use 'branch=0'.
+
+ The default config selects 'branch' if it is available.
+
+ptw Enable PTWRITE packets which are produced when a ptwrite instruction
+ is executed.
+
+ Support for this feature is indicated by:
+
+ /sys/bus/event_source/devices/intel_pt/caps/ptwrite
+
+ which contains "1" if the feature is supported and
+ "0" otherwise.
+
+fup_on_ptw Enable a FUP packet to follow the PTWRITE packet. The FUP packet
+ provides the address of the ptwrite instruction. In the absence of
+ fup_on_ptw, the decoder will use the address of the previous branch
+ if branch tracing is enabled, otherwise the address will be zero.
+ Note that fup_on_ptw will work even when branch tracing is disabled.
+
+pwr_evt Enable power events. The power events provide information about
+ changes to the CPU C-state.
+
+ Support for this feature is indicated by:
+
+ /sys/bus/event_source/devices/intel_pt/caps/power_event_trace
+
+ which contains "1" if the feature is supported and
+ "0" otherwise.
+
new snapshot option
-------------------
@@ -674,13 +713,15 @@ Having no option is the same as
which, in turn, is the same as
- --itrace=ibxe
+ --itrace=ibxwpe
The letters are:
i synthesize "instructions" events
b synthesize "branches" events
x synthesize "transactions" events
+ w synthesize "ptwrite" events
+ p synthesize "power" events
c synthesize branches events (calls only)
r synthesize branches events (returns only)
e synthesize tracing error events
@@ -699,7 +740,40 @@ and "r" can be combined to get calls and returns.
'flags' field can be used in perf script to determine whether the event is a
tranasaction start, commit or abort.
-Error events are new. They show where the decoder lost the trace. Error events
+Note that "instructions", "branches" and "transactions" events depend on code
+flow packets which can be disabled by using the config term "branch=0". Refer
+to the config terms section above.
+
+"ptwrite" events record the payload of the ptwrite instruction and whether
+"fup_on_ptw" was used. "ptwrite" events depend on PTWRITE packets which are
+recorded only if the "ptw" config term was used. Refer to the config terms
+section above. perf script "synth" field displays "ptwrite" information like
+this: "ip: 0 payload: 0x123456789abcdef0" where "ip" is 1 if "fup_on_ptw" was
+used.
+
+"Power" events correspond to power event packets and CBR (core-to-bus ratio)
+packets. While CBR packets are always recorded when tracing is enabled, power
+event packets are recorded only if the "pwr_evt" config term was used. Refer to
+the config terms section above. The power events record information about
+C-state changes, whereas CBR is indicative of CPU frequency. perf script
+"event,synth" fields display information like this:
+ cbr: cbr: 22 freq: 2189 MHz (200%)
+ mwait: hints: 0x60 extensions: 0x1
+ pwre: hw: 0 cstate: 2 sub-cstate: 0
+ exstop: ip: 1
+ pwrx: deepest cstate: 2 last cstate: 2 wake reason: 0x4
+Where:
+ "cbr" includes the frequency and the percentage of maximum non-turbo
+ "mwait" shows mwait hints and extensions
+ "pwre" shows C-state transitions (to a C-state deeper than C0) and
+ whether initiated by hardware
+ "exstop" indicates execution stopped and whether the IP was recorded
+ exactly,
+ "pwrx" indicates return to C0
+For more details refer to the Intel 64 and IA-32 Architectures Software
+Developer Manuals.
+
+Error events show where the decoder lost the trace. Error events
are quite important. Users must know if what they are seeing is a complete
picture or not.
diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index e2a4c5e0dbe5..a3abe04c779d 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -3,13 +3,15 @@
c synthesize branches events (calls only)
r synthesize branches events (returns only)
x synthesize transactions events
+ w synthesize ptwrite events
+ p synthesize power events
e synthesize error events
d create a debug log
g synthesize a call chain (use with i or x)
l synthesize last branch entries (use with i or x)
s skip initial number of events
- The default is all events i.e. the same as --itrace=ibxe
+ The default is all events i.e. the same as --itrace=ibxwpe
In addition, the period (default 100000) for instructions events
can be specified in units of:
@@ -26,8 +28,8 @@
Also the number of last branch entries (default 64, max. 1024) for
instructions or transaction