Notes on using perf

A coarse grain view

    $ perf stat ./foo

       Performance counter stats for './foo':

                   37.89 msec task-clock                #    0.903 CPUs utilized
                      88      context-switches          #    0.002 M/sec
                       0      cpu-migrations            #    0.000 K/sec
                   2,079      page-faults               #    0.055 M/sec
             175,227,536      cycles                    #    4.624 GHz
             176,083,469      instructions              #    1.00  insn per cycle
              33,944,099      branches                  #  895.755 M/sec
                 118,643      branch-misses             #    0.35% of all branches

             0.041966679 seconds time elapsed

             0.033481000 seconds user
             0.004783000 seconds sys

task-clock: How long the program sat on one of the cores in the machine
context-switches: How many times we got moved around by the scheduler of the CPU
page-faults: Number of times we try to address code that isn't in memory (anymore?)

    $ perf record ./foo
    $ perf report

It's more useful if we have a callgraph; Chandler Carruth's talk from CppCon 2015 demonstrates this well.

Flags for compilation

    -fno-omit-frame-pointer

And then running

    $ perf record -g ./foo
    $ perf report -g 'graph,0.5,caller' # invert graph, add filter

We want this to discover how to "walk the stack".

Another helpful tip: press 'a' to view the assembly output.