Age | Commit message (Collapse) | Author |
|
In the function parse_nodes_opt(), the statement "return 0;" is dead
code, remove it.
Signed-off-by: Peng Fan <fanpeng@loongson.cn>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/1597401894-27549-1-git-send-email-fanpeng@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It is currently assumed that each node contains at most nr_cpus/nr_nodes
CPUs and nodes' CPU ranges do not overlap.
That assumption is generally incorrect as there are archs where a CPU
number does not depend on to its node number.
This update removes the described assumption by simply calling
numa_node_to_cpus() interface and using the returned mask for binding
CPUs to nodes.
Also, variable types and names made consistent in functions using
cpumask.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Balamuruhan S <bala24@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Link: http://lore.kernel.org/lkml/20200813113247.GA2014@oc3871087118.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Couple numa_allocate_cpumask() and numa_free_cpumask() functions
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Balamuruhan S <bala24@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Link: http://lore.kernel.org/lkml/20200813113041.GA1685@oc3871087118.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
For memcpy, the source pages are memset to zero only when --cycles is
used. This leads to wildly different results with or without --cycles,
since all sources pages are likely to be mapped to the same zero page
without explicit writes.
Before this fix:
$ export cmd="./perf stat -e LLC-loads -- ./perf bench \
mem memcpy -s 1024MB -l 100 -f default"
$ $cmd
2,935,826 LLC-loads
3.821677452 seconds time elapsed
$ $cmd --cycles
217,533,436 LLC-loads
8.616725985 seconds time elapsed
After this fix:
$ $cmd
214,459,686 LLC-loads
8.674301124 seconds time elapsed
$ $cmd --cycles
214,758,651 LLC-loads
8.644480006 seconds time elapsed
Fixes: 47b5757bac03c338 ("perf bench mem: Move boilerplate memory allocation to the infrastructure")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel@axis.com
Link: http://lore.kernel.org/lkml/20200810133404.30829-1-vincent.whitchurch@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
There are a couple of spelling mistakes in the text. Fix these.
Signed-off-by: Colin King <colin.king@canonical.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: http://lore.kernel.org/lkml/20200812064647.200132-1-colin.king@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Standard benchmark names let users know the tests specifics. For
example "2x1-bw-process" name tells that two processes one thread each
are run and the RAM bandwidth is measured.
Several benchmarks names do not correspond to their actual running
configuration. Fix that and also some whitespace and comment
inconsistencies.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/6b6f2084f132ee8e9203dc7c32f9deb209b87a68.1597004831.git.agordeev@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/d949f5f48e17fc816f3beecf8479f1b2480345e4.1597004831.git.agordeev@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
for_each_set_bit, or similar functions like for_each_cpu, may be hot
within the kernel. If many bits were set then one could imagine on Intel
a "bt" instruction with every bit may be faster than the function call
and word length find_next_bit logic. Add a benchmark to measure this.
This benchmark on AMD rome and Intel skylakex shows "bt" is not a good
option except for very small bitmaps.
Committer testing:
# perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
syscall: System call benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
internals: Perf-internals benchmarks
all: All benchmarks
# perf bench mem
# List of available benchmarks for collection 'mem':
memcpy: Benchmark for memcpy() functions
memset: Benchmark for memset() functions
find_bit: Benchmark for find_bit() functions
all: Run all memory access benchmarks
# perf bench mem find_bit
# Running 'mem/find_bit' benchmark:
100000 operations 1 bits set of 1 bits
Average for_each_set_bit took: 730.200 usec (+- 6.468 usec)
Average test_bit loop took: 366.200 usec (+- 4.652 usec)
100000 operations 1 bits set of 2 bits
Average for_each_set_bit took: 781.000 usec (+- 24.247 usec)
Average test_bit loop took: 550.200 usec (+- 4.152 usec)
100000 operations 2 bits set of 2 bits
Average for_each_set_bit took: 1113.400 usec (+- 112.340 usec)
Average test_bit loop took: 1098.500 usec (+- 182.834 usec)
100000 operations 1 bits set of 4 bits
Average for_each_set_bit took: 843.800 usec (+- 8.772 usec)
Average test_bit loop took: 948.800 usec (+- 10.278 usec)
100000 operations 2 bits set of 4 bits
Average for_each_set_bit took: 1185.800 usec (+- 114.345 usec)
Average test_bit loop took: 1473.200 usec (+- 175.498 usec)
100000 operations 4 bits set of 4 bits
Average for_each_set_bit took: 1769.667 usec (+- 233.177 usec)
Average test_bit loop took: 1864.933 usec (+- 187.470 usec)
100000 operations 1 bits set of 8 bits
Average for_each_set_bit took: 898.000 usec (+- 21.755 usec)
Average test_bit loop took: 1768.400 usec (+- 23.672 usec)
100000 operations 2 bits set of 8 bits
Average for_each_set_bit took: 1244.900 usec (+- 116.396 usec)
Average test_bit loop took: 2201.800 usec (+- 145.398 usec)
100000 operations 4 bits set of 8 bits
Average for_each_set_bit took: 1822.533 usec (+- 231.554 usec)
Average test_bit loop took: 2569.467 usec (+- 168.453 usec)
100000 operations 8 bits set of 8 bits
Average for_each_set_bit took: 2845.100 usec (+- 441.365 usec)
Average test_bit loop took: 3023.300 usec (+- 219.575 usec)
100000 operations 1 bits set of 16 bits
Average for_each_set_bit took: 923.400 usec (+- 17.560 usec)
Average test_bit loop took: 3240.000 usec (+- 16.492 usec)
100000 operations 2 bits set of 16 bits
Average for_each_set_bit took: 1264.300 usec (+- 114.034 usec)
Average test_bit loop took: 3714.400 usec (+- 158.898 usec)
100000 operations 4 bits set of 16 bits
Average for_each_set_bit took: 1817.867 usec (+- 222.199 usec)
Average test_bit loop took: 4015.333 usec (+- 154.162 usec)
100000 operations 8 bits set of 16 bits
Average for_each_set_bit took: 2826.350 usec (+- 433.457 usec)
Average test_bit loop took: 4460.350 usec (+- 210.762 usec)
100000 operations 16 bits set of 16 bits
Average for_each_set_bit took: 4615.600 usec (+- 809.350 usec)
Average test_bit loop took: 5129.960 usec (+- 320.821 usec)
100000 operations 1 bits set of 32 bits
Average for_each_set_bit took: 904.400 usec (+- 14.250 usec)
Average test_bit loop took: 6194.000 usec (+- 29.254 usec)
100000 operations 2 bits set of 32 bits
Average for_each_set_bit took: 1252.700 usec (+- 116.432 usec)
Average test_bit loop took: 6652.400 usec (+- 154.352 usec)
100000 operations 4 bits set of 32 bits
Average for_each_set_bit took: 1824.200 usec (+- 229.133 usec)
Average test_bit loop took: 6961.733 usec (+- 154.682 usec)
100000 operations 8 bits set of 32 bits
Average for_each_set_bit took: 2823.950 usec (+- 432.296 usec)
Average test_bit loop took: 7351.900 usec (+- 193.626 usec)
100000 operations 16 bits set of 32 bits
Average for_each_set_bit took: 4552.560 usec (+- 785.141 usec)
Average test_bit loop took: 7998.360 usec (+- 305.629 usec)
100000 operations 32 bits set of 32 bits
Average for_each_set_bit took: 7557.067 usec (+- 1407.702 usec)
Average test_bit loop took: 9072.400 usec (+- 513.209 usec)
100000 operations 1 bits set of 64 bits
Average for_each_set_bit took: 896.800 usec (+- 14.389 usec)
Average test_bit loop took: 11927.200 usec (+- 68.862 usec)
100000 operations 2 bits set of 64 bits
Average for_each_set_bit took: 1230.400 usec (+- 111.731 usec)
Average test_bit loop took: 12478.600 usec (+- 189.382 usec)
100000 operations 4 bits set of 64 bits
Average for_each_set_bit took: 1844.733 usec (+- 244.826 usec)
Average test_bit loop took: 12911.467 usec (+- 206.246 usec)
100000 operations 8 bits set of 64 bits
Average for_each_set_bit took: 2779.300 usec (+- 413.612 usec)
Average test_bit loop took: 13372.650 usec (+- 239.623 usec)
100000 operations 16 bits set of 64 bits
Average for_each_set_bit took: 4423.920 usec (+- 748.240 usec)
Average test_bit loop took: 13995.800 usec (+- 318.427 usec)
100000 operations 32 bits set of 64 bits
Average for_each_set_bit took: 7580.600 usec (+- 1462.407 usec)
Average test_bit loop took: 15063.067 usec (+- 516.477 usec)
100000 operations 64 bits set of 64 bits
Average for_each_set_bit took: 13391.514 usec (+- 2765.371 usec)
Average test_bit loop took: 16974.914 usec (+- 916.936 usec)
100000 operations 1 bits set of 128 bits
Average for_each_set_bit took: 1153.800 usec (+- 124.245 usec)
Average test_bit loop took: 26959.000 usec (+- 714.047 usec)
100000 operations 2 bits set of 128 bits
Average for_each_set_bit took: 1445.200 usec (+- 113.587 usec)
Average test_bit loop took: 25798.800 usec (+- 512.908 usec)
100000 operations 4 bits set of 128 bits
Average for_each_set_bit took: 1990.933 usec (+- 219.362 usec)
Average test_bit loop took: 25589.400 usec (+- 348.288 usec)
100000 operations 8 bits set of 128 bits
Average for_each_set_bit took: 2963.000 usec (+- 419.487 usec)
Average test_bit loop took: 25690.050 usec (+- 262.025 usec)
100000 operations 16 bits set of 128 bits
Average for_each_set_bit took: 4585.200 usec (+- 741.734 usec)
Average test_bit loop took: 26125.040 usec (+- 274.127 usec)
100000 operations 32 bits set of 128 bits
Average for_each_set_bit took: 7626.200 usec (+- 1404.950 usec)
Average test_bit loop took: 27038.867 usec (+- 442.554 usec)
100000 operations 64 bits set of 128 bits
Average for_each_set_bit took: 13343.371 usec (+- 2686.460 usec)
Average test_bit loop took: 28936.543 usec (+- 883.257 usec)
100000 operations 128 bits set of 128 bits
Average for_each_set_bit took: 23442.950 usec (+- 4880.541 usec)
Average test_bit loop took: 32484.125 usec (+- 1691.931 usec)
100000 operations 1 bits set of 256 bits
Average for_each_set_bit took: 1183.000 usec (+- 32.073 usec)
Average test_bit loop took: 50114.600 usec (+- 198.880 usec)
100000 operations 2 bits set of 256 bits
Average for_each_set_bit took: 1550.000 usec (+- 124.550 usec)
Average test_bit loop took: 50334.200 usec (+- 128.425 usec)
100000 operations 4 bits set of 256 bits
Average for_each_set_bit took: 2164.333 usec (+- 246.359 usec)
Average test_bit loop took: 49959.867 usec (+- 188.035 usec)
100000 operations 8 bits set of 256 bits
Average for_each_set_bit took: 3211.200 usec (+- 454.829 usec)
Average test_bit loop took: 50140.850 usec (+- 176.046 usec)
100000 operations 16 bits set of 256 bits
Average for_each_set_bit took: 5181.640 usec (+- 882.726 usec)
Average test_bit loop took: 51003.160 usec (+- 419.601 usec)
100000 operations 32 bits set of 256 bits
Average for_each_set_bit took: 8369.333 usec (+- 1513.150 usec)
Average test_bit loop took: 52096.700 usec (+- 573.022 usec)
100000 operations 64 bits set of 256 bits
Average for_each_set_bit took: 13866.857 usec (+- 2649.393 usec)
Average test_bit loop took: 53989.600 usec (+- 938.808 usec)
100000 operations 128 bits set of 256 bits
Average for_each_set_bit took: 23588.350 usec (+- 4724.222 usec)
Average test_bit loop took: 57300.625 usec (+- 1625.962 usec)
100000 operations 256 bits set of 256 bits
Average for_each_set_bit took: 42752.200 usec (+- 9202.084 usec)
Average test_bit loop took: 64426.933 usec (+- 3402.326 usec)
100000 operations 1 bits set of 512 bits
Average for_each_set_bit took: 1632.000 usec (+- 229.954 usec)
Average test_bit loop took: 98090.000 usec (+- 1120.435 usec)
100000 operations 2 bits set of 512 bits
Average for_each_set_bit took: 1937.700 usec (+- 148.902 usec)
Average test_bit loop took: 100364.100 usec (+- 1433.219 usec)
100000 operations 4 bits set of 512 bits
Average for_each_set_bit took: 2528.000 usec (+- 243.654 usec)
Average test_bit loop took: 99932.067 usec (+- 955.868 usec)
100000 operations 8 bits set of 512 bits
Average for_each_set_bit took: 3734.100 usec (+- 512.359 usec)
Average test_bit loop took: 98944.750 usec (+- 812.070 usec)
100000 operations 16 bits set of 512 bits
Average for_each_set_bit took: 5551.400 usec (+- 846.605 usec)
Average test_bit loop took: 98691.600 usec (+- 654.753 usec)
100000 operations 32 bits set of 512 bits
Average for_each_set_bit took: 8594.500 usec (+- 1446.072 usec)
Average test_bit loop took: 99176.867 usec (+- 579.990 usec)
100000 operations 64 bits set of 512 bits
Average for_each_set_bit took: 13840.743 usec (+- 2527.055 usec)
Average test_bit loop took: 100758.743 usec (+- 833.865 usec)
100000 operations 128 bits set of 512 bits
Average for_each_set_bit took: 23185.925 usec (+- 4532.910 usec)
Average test_bit loop took: 103786.700 usec (+- 1475.276 usec)
100000 operations 256 bits set of 512 bits
Average for_each_set_bit took: 40322.400 usec (+- 8341.802 usec)
Average test_bit loop took: 109433.378 usec (+- 2742.615 usec)
100000 operations 512 bits set of 512 bits
Average for_each_set_bit took: 71804.540 usec (+- 15436.546 usec)
Average test_bit loop took: 120255.440 usec (+- 5252.777 usec)
100000 operations 1 bits set of 1024 bits
Average for_each_set_bit took: 1859.600 usec (+- 27.969 usec)
Average test_bit loop took: 187676.000 usec (+- 1337.770 usec)
100000 operations 2 bits set of 1024 bits
Average for_each_set_bit took: 2273.600 usec (+- 139.420 usec)
Average test_bit loop took: 188176.000 usec (+- 684.357 usec)
100000 operations 4 bits set of 1024 bits
Average for_each_set_bit took: 2940.400 usec (+- 268.213 usec)
Average test_bit loop took: 189172.600 usec (+- 593.295 usec)
100000 operations 8 bits set of 1024 bits
Average for_each_set_bit took: 4224.200 usec (+- 547.933 usec)
Average test_bit loop took: 190257.250 usec (+- 621.021 usec)
100000 operations 16 bits set of 1024 bits
Average for_each_set_bit took: 6090.560 usec (+- 877.975 usec)
Average test_bit loop took: 190143.880 usec (+- 503.753 usec)
100000 operations 32 bits set of 1024 bits
Average for_each_set_bit took: 9178.800 usec (+- 1475.136 usec)
Average test_bit loop took: 190757.100 usec (+- 494.757 usec)
100000 operations 64 bits set of 1024 bits
Average for_each_set_bit took: 14441.457 usec (+- 2545.497 usec)
Average test_bit loop took: 192299.486 usec (+- 795.251 usec)
100000 operations 128 bits set of 1024 bits
Average for_each_set_bit took: 23623.825 usec (+- 4481.182 usec)
Average test_bit loop took: 194885.550 usec (+- 1300.817 usec)
100000 operations 256 bits set of 1024 bits
Average for_each_set_bit took: 40194.956 usec (+- 8109.056 usec)
Average test_bit loop took: 200259.311 usec (+- 2566.085 usec)
100000 operations 512 bits set of 1024 bits
Average for_each_set_bit took: 70983.560 usec (+- 15074.982 usec)
Average test_bit loop took: 210527.460 usec (+- 4968.980 usec)
100000 operations 1024 bits set of 1024 bits
Average for_each_set_bit took: 136530.345 usec (+- 31584.400 usec)
Average test_bit loop took: 233329.691 usec (+- 10814.036 usec)
100000 operations 1 bits set of 2048 bits
Average for_each_set_bit took: 3077.600 usec (+- 76.376 usec)
Average test_bit loop took: 402154.400 usec (+- 518.571 usec)
100000 operations 2 bits set of 2048 bits
Average for_each_set_bit took: 3508.600 usec (+- 148.350 usec)
Average test_bit loop took: 403814.500 usec (+- 1133.027 usec)
100000 operations 4 bits set of 2048 bits
Average for_each_set_bit took: 4219.333 usec (+- 285.844 usec)
Average test_bit loop took: 404312.533 usec (+- 985.751 usec)
100000 operations 8 bits set of 2048 bits
Average for_each_set_bit took: 5670.550 usec (+- 615.238 usec)
Average test_bit loop took: 405321.800 usec (+- 1038.487 usec)
100000 operations 16 bits set of 2048 bits
Average for_each_set_bit took: 7785.080 usec (+- 992.522 usec)
Average test_bit loop took: 406746.160 usec (+- 1015.478 usec)
100000 operations 32 bits set of 2048 bits
Average for_each_set_bit took: 11163.800 usec (+- 1627.320 usec)
Average test_bit loop took: 406124.267 usec (+- 898.785 usec)
100000 operations 64 bits set of 2048 bits
Average for_each_set_bit took: 16964.629 usec (+- 2806.130 usec)
Average test_bit loop took: 406618.514 usec (+- 798.356 usec)
100000 operations 128 bits set of 2048 bits
Average for_each_set_bit took: 27219.625 usec (+- 4988.458 usec)
Average test_bit loop took: 410149.325 usec (+- 1705.641 usec)
100000 operations 256 bits set of 2048 bits
Average for_each_set_bit took: 45138.578 usec (+- 8831.021 usec)
Average test_bit loop took: 415462.467 usec (+- 2725.418 usec)
100000 operations 512 bits set of 2048 bits
Average for_each_set_bit took: 77450.540 usec (+- 15962.238 usec)
Average test_bit loop took: 426089.180 usec (+- 5171.788 usec)
100000 operations 1024 bits set of 2048 bits
Average for_each_set_bit took: 138023.636 usec (+- 29826.959 usec)
Average test_bit loop took: 446346.636 usec (+- 9904.417 usec)
100000 operations 2048 bits set of 2048 bits
Average for_each_set_bit took: 251072.600 usec (+- 55947.692 usec)
Average test_bit loop took: 484855.983 usec (+- 18970.431 usec)
#
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200729220034.1337168-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The usefulness of having a standard way of testing syscall performance
has come up from time to time[0]. Furthermore, some of our testing
machinery (such as 'mmtests') already makes use of a simplified version
of the microbenchmark. This patch mainly takes the same idea to measure
syscall throughput compatible with 'perf-bench' via getppid(2), yet
without any of the additional template stuff from Ingo's version (based
on numa.c). The code is identical to what mmtests uses.
[0] https://lore.kernel.org/lkml/20160201074156.GA27156@gmail.com/
Committer notes:
Add mising stdlib.h and unistd.h to get the prototypes for exit() and
getppid().
Committer testing:
$ perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
syscall: System call benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
internals: Perf-internals benchmarks
all: All benchmarks
$
$ perf bench syscall
# List of available benchmarks for collection 'syscall':
basic: Benchmark for basic getppid(2) calls
all: Run all syscall benchmarks
$ perf bench syscall basic
# Running 'syscall/basic' benchmark:
# Executed 10000000 getppid() calls
Total time: 3.679 [sec]
0.367957 usecs/op
2717708 ops/sec
$ perf bench syscall all
# Running syscall/basic benchmark...
# Executed 10000000 getppid() calls
Total time: 3.644 [sec]
0.364456 usecs/op
2743815 ops/sec
$
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20190308181747.l36zqz2avtivrr3c@linux-r8p5
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array
member[1][2], introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this
change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gustavo A. R. Silva <gustavo@embeddedor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200515172926.GA31976@embeddedor
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To be consistent with other such auto-detected features.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anand K Mistry <amistry@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add a benchmark for kallsyms parsing. Example output:
Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 103.971 ms (+- 0.121 ms)
Committer testing:
Test Machine: AMD Ryzen 5 3600X 6-Core Processor
[root@five ~]# perf bench internals kallsyms-parse
# Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 79.692 ms (+- 0.101 ms)
[root@five ~]# perf stat -r5 perf bench internals kallsyms-parse
# Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 80.563 ms (+- 0.079 ms)
# Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 81.046 ms (+- 0.155 ms)
# Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 80.874 ms (+- 0.104 ms)
# Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 81.173 ms (+- 0.133 ms)
# Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 81.169 ms (+- 0.074 ms)
Performance counter stats for 'perf bench internals kallsyms-parse' (5 runs):
8,093.54 msec task-clock # 0.999 CPUs utilized ( +- 0.14% )
3,165 context-switches # 0.391 K/sec ( +- 0.18% )
10 cpu-migrations # 0.001 K/sec ( +- 23.13% )
744 page-faults # 0.092 K/sec ( +- 0.21% )
34,551,564,954 cycles # 4.269 GHz ( +- 0.05% ) (83.33%)
1,160,584,308 stalled-cycles-frontend # 3.36% frontend cycles idle ( +- 1.60% ) (83.33%)
14,974,323,985 stalled-cycles-backend # 43.34% backend cycles idle ( +- 0.24% ) (83.33%)
58,712,905,705 instructions # 1.70 insn per cycle
# 0.26 stalled cycles per insn ( +- 0.01% ) (83.34%)
14,136,433,778 branches # 1746.632 M/sec ( +- 0.01% ) (83.33%)
141,943,217 branch-misses # 1.00% of all branches ( +- 0.04% ) (83.33%)
8.1040 +- 0.0115 seconds time elapsed ( +- 0.14% )
[root@five ~]#
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200501221315.54715-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
By default this isn't run as it reads /proc and may not have access.
For consistency, modify the single threaded benchmark to compute an
average time per event.
Committer testing:
$ grep -m1 "model name" /proc/cpuinfo
model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
$ grep "model name" /proc/cpuinfo | wc -l
8
$
$ perf bench internals synthesize -h
# Running 'internals/synthesize' benchmark:
Usage: perf bench internals synthesize <options>
-I, --multi-iterations <n>
Number of iterations used to compute multi-threaded average
-i, --single-iterations <n>
Number of iterations used to compute single-threaded average
-M, --max-threads <n>
Maximum number of threads in multithreaded bench
-m, --min-threads <n>
Minimum number of threads in multithreaded bench
-s, --st Run single threaded benchmark
-t, --mt Run multi-threaded benchmark
$
$ perf bench internals synthesize -t
# Running 'internals/synthesize' benchmark:
Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:
Number of synthesis threads: 1
Average synthesis took: 65449.000 usec (+- 586.442 usec)
Average num. events: 9405.400 (+- 0.306)
Average time per event 6.959 usec
Number of synthesis threads: 2
Average synthesis took: 37838.300 usec (+- 130.259 usec)
Average num. events: 9501.800 (+- 20.469)
Average time per event 3.982 usec
Number of synthesis threads: 3
Average synthesis took: 48551.400 usec (+- 225.686 usec)
Average num. events: 9544.000 (+- 0.000)
Average time per event 5.087 usec
Number of synthesis threads: 4
Average synthesis took: 29632.500 usec (+- 50.808 usec)
Average num. events: 9544.000 (+- 0.000)
Average time per event 3.105 usec
Number of synthesis threads: 5
Average synthesis took: 33920.400 usec (+- 284.509 usec)
Average num. events: 9544.000 (+- 0.000)
Average time per event 3.554 usec
Number of synthesis threads: 6
Average synthesis took: 27604.100 usec (+- 72.344 usec)
Average num. events: 9548.000 (+- 0.000)
Average time per event 2.891 usec
Number of synthesis threads: 7
Average synthesis took: 25406.300 usec (+- 933.371 usec)
Average num. events: 9545.500 (+- 0.167)
Average time per event 2.662 usec
Number of synthesis threads: 8
Average synthesis took: 24110.400 usec (+- 73.229 usec)
Average num. events: 9551.000 (+- 0.000)
Average time per event 2.524 usec
$
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Zhizhikin <andrey.z@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Fix div-by-zero if runtime is zero:
$ perf bench futex hash --runtime=0
# Running 'futex/hash' benchmark:
Run summary [PID 12090]: 4 threads, each operating on 1024 [private] futexes for 0 secs.
Floating point exception (core dumped)
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200417132330.119407-4-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Event synthesis may occur at the start or end (tail) of a perf command.
In system-wide mode it can scan every process in /proc, which may add
seconds of latency before event recording. Add a new benchmark that
times how long event synthesis takes with and without data synthesis.
An example execution looks like:
$ perf bench internals synthesize
# Running 'internals/synthesize' benchmark:
Average synthesis took: 168.253800 usec
Average data synthesis took: 208.104700 usec
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Zhizhikin <andrey.z@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200402154357.107873-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Avoid garbage in sigaction structs used in sigaction() syscalls.
Valgrind is complaining about it.
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200305083714.9381-4-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Since commit 3b2323c2c1c4 ("perf bench futex: Use cpumaps") the default
number of threads the benchmark uses got changed from number of online
CPUs to zero:
$ perf bench futex wake
# Running 'futex/wake' benchmark:
Run summary [PID 15930]: blocking on 0 threads (at [private] futex 0x558b8ee4bfac), waking up 1 at a time.
[Run 1]: Wokeup 0 of 0 threads in 0.0000 ms
[...]
[Run 10]: Wokeup 0 of 0 threads in 0.0000 ms
Wokeup 0 of 0 threads in 0.0004 ms (+-40.82%)
Restore the old behavior by grabbing the number of online CPUs via
cpu->nr:
$ perf bench futex wake
# Running 'futex/wake' benchmark:
Run summary [PID 18356]: blocking on 8 threads (at [private] futex 0xb3e62c), waking up 1 at a time.
[Run 1]: Wokeup 8 of 8 threads in 0.0260 ms
[...]
[Run 10]: Wokeup 8 of 8 threads in 0.0270 ms
Wokeup 8 of 8 threads in 0.0419 ms (+-24.35%)
Fixes: 3b2323c2c1c4 ("perf bench futex: Use cpumaps")
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200305083714.9381-3-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Noticed with gcc 10 (fedora rawhide) that those variables were not being
declared as static, so end up with:
ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
make[4]: *** [/git/perf/tools/build/Makefile.build:145: /tmp/build/perf/bench/perf-in.o] Error 1
Prefix those with bench__ and add them to bench/bench.h, so that we can
share those on the tools needing to access those variables from signal
handlers.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20200303155811.GD13702@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Only a 'struct perf_cmp_map' forward allocation is necessary, fix the
places that need the header but were getting it indirectly, by luck,
from env.h.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-3sj3n534zghxhk7ygzeaqlx9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Check that it is not needed and remove, fixing up some fallout for
places where it was only serving to get something else.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-9h6dg6lsqe2usyqjh5rrues4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now that builtin.h isn't included by any other header, we can check
where it is really needed, i.e. we can remove it and be sure that it
isn't being obtained indirectly.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-mn7jheex85iw9qo6tlv26hb2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
With the movement of lots of stuff out of perf.h to other headers we
ended up not needing it in lots of places, remove it from those places.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-c718m0sxxwp73lp9d8vpihb4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
And remove unneeded include directives from perf-sys.h to prune the
header dependency tree.
Fixup the fallout in places where definitions were being used without
the needed include directives that were being satisfied because they
were in perf-sys.h.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-7b1zvugiwak4ibfa3j6ott7f@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To get closer to upstream and check if we need to sync more UAPI
headers, pick up fixes for libbpf that prevent perf's container tests
from completing successfuly, etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Michael reported an issue with perf bench numa failing with binding to
cpu0 with '-0' option.
# perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd
# Running 'numa/mem' benchmark:
# Running main, "perf bench numa numa-mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd"
binding to node 0, mask: 0000000000000001 => -1
perf: bench/numa.c:356: bind_to_memnode: Assertion `!(ret)' failed.
Aborted (core dumped)
This happens when the cpu0 is not part of node0, which is the benchmark
assumption and we can see that's not the case for some powerpc servers.
Using correct node for cpu0 binding.
Reported-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20190801142642.28004-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Moving the following functions from tools/perf:
cpu_map__new()
cpu_map__read()
to libperf with the following names:
perf_cpu_map__new()
perf_cpu_map__read()
Committer notes:
Fixed up this one:
tools/perf/arch/arm/util/cs-etm.c
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-44-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Rename struct cpu_map to struct perf_cpu_map, so it could be part of
libperf.
Committer notes:
Added fixes for arm64, provided by Jiri.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In places where the equivalent was already being done, i.e.:
free(a);
a = NULL;
And in placs where struct members are being freed so that if we have
some erroneous reference to its struct, then accesses to freed members
will result in segfaults, which we can detect faster than use after free
to areas that may still have something seemingly valid.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-jatyoofo5boc1bsvoig6bb6i@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Eroding a bit more the tools/perf/util/util.h hodpodge header.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
While cross building perf to the ARC architecture on a fedora 30 host,
we were failing with:
CC /tmp/build/perf/bench/numa.o
bench/numa.c: In function ‘worker_thread’:
bench/numa.c:1261:12: error: ‘RUSAGE_THREAD’ undeclared (first use in this function); did you mean ‘SIGEV_THREAD’?
getrusage(RUSAGE_THREAD, &rusage);
^~~~~~~~~~~~~
SIGEV_THREAD
bench/numa.c:1261:12: note: each undeclared identifier is reported only once for each function it appears in
[perfbuilder@60d5802468f6 perf]$ /arc_gnu_2019.03-rc1_prebuilt_uclibc_le_archs_linux_install/bin/arc-linux-gcc --version | head -1
arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225
[perfbuilder@60d5802468f6 perf]$
Trying to reproduce a report by Vineet, I noticed that, with just
cross-built zlib and numactl libraries, I ended up with the above
failure.
So, since RUSAGE_THREAD is available as a define, check for that and
numactl libraries, I ended up with the above failure.
So, since RUSAGE_THREAD is available as a define in the system headers,
check if it is defined in the 'perf bench numa' sources and define it if
not.
Now it builds and I have to figure out if the problem reported by Vineet
only takes place if we have libelf or some other library available.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linux-snps-arc@lists.infradead.org
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Link: https://lkml.kernel.org/n/tip-2wb4r1gir9xrevbpq7qp0amk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes from Arnaldo:
BPF:
Song Liu:
- Add support for annotating BPF programs, using the PERF_RECORD_BPF_EVENT
and PERF_RECORD_KSYMBOL recently added to the kernel and plugging
binutils's libopcodes disassembly of BPF programs with the existing
annotation interfaces in 'perf annotate', 'perf report' and 'perf top'
various output formats (--stdio, --stdio2, --tui).
perf list:
Andi Kleen:
- Filter metrics when using substring search.
perf record:
Andi Kleen:
- Allow to limit number of reported perf.data files
- Clarify help for --switch-output.
perf report:
Andi Kleen
- Indicate JITed code better.
- Show all sort keys in help output.
perf script:
Andi Kleen:
- Support relative time.
perf stat:
Andi Kleen:
- Improve scaling.
General:
Changbin Du:
- Fix some mostly error path memory and reference count leaks found
using gcc's ASan and UBSan.
Vendor events:
Mamatha Inamdar:
- Remove P8 HW events which are not supported.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Optimization level '-Og' offers a reasonable level of optimization while
maintaining fast compilation and a good debugging experience. This patch
tries to make it work.
$ make DEBUG=1 EXTRA_CFLAGS='-Og'
bench/epoll-ctl.c: In function ‘do_threads’:
bench/epoll-ctl.c:274:9: error: ‘ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
return ret;
^~~
...
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20190316080556.3075-4-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This replaces all open encodings in tools with NUMA_NO_NODE. Also
linux/numa.h is now needed for the perf build.
[sfr@canb.auug.org.au: fix for replace open encodings for NUMA_NO_NODE]
Link: http://lkml.kernel.org/r/20190108131141.730e9c4f@canb.auug.org.au
Link: http://lkml.kernel.org/r/1545127933-10711-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: David Hildenbrand <david@redhat.com>
Cc: Doug Ledford <dledford@redhat.com> [drivers/infiniband]
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe]
Cc: Jens Axboe <axboe@kernel.dk> [mtip32xx]
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Vinod Koul <vkoul@kernel.org> [dmaengine.c]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|