Filter sample records that are generated from Statistical Profiling Extension (SPE) counters using the Select Counters dialog box. Sample records that do not meet the filtering criteria are discarded and are not written to the profiling buffer.
Usually Streamline collects information from hardware counters, which provide aggregate counts. You can therefore only attribute counters to regions of the application. The PC is sampled in software using a timer interrupt, so the sampling rate is limited. SPE is an optional extension to the Arm®v8.2-A architecture that samples the PC periodically in hardware as part of the pipeline of the processor. This form of sampling has a low probe effect so you can set the sampling rate much higher. As it is built into the pipeline, SPE can also collect extra information about each sampled instruction. It therefore allows for a much more detailed analysis of the executed code.
Streamline supports visualizing the following SPE data:
Provide issue and total instruction execution latency counts, which help identify execution stalls. They also provide load and store latency counts for memory accesses. Use this data to identify high latency accesses and poor cache use.
Provide information about each sampled instruction, including whether:
It accessed, hit, or missed a cache level.
It was a mis-predicted or not taken branch.
An exclusive load or store failed.
Use this data to identify branch prediction problems, poor cache use, and lock contention.
Provide information about the level of the memory hierarchy that a load or store accessed.
By default, all operations are sampled, but you can filter the samples using the SPE settings in the Select Counters dialog box.
You must use hardware which supports the statistical profiling extension.
You must have a Linux kernel with:
The arm_spe_pmu module enabled.
Support for SPE in the device tree or UEFI.
Depending on the hardware and kernel configuration, you might need to disable KPTI. To do this, use the kernel command-line argument kpti=off when booting the device. You must disable KPTI on Armv8.2 processors with SPE, such as the Neoverse™ N1.
In the Start view, select the Use advanced mode checkbox.
Click Select Counters.
In the Events to collect list, select a counter in the Statistical Profiling Extension category.
Configure the SPE counter.
For example:
To identify operations that are slow because they access memory instead of the cache, set a minimum latency.
To find branches that trigger mispredictions, select the Mispredicted event filter.
Click Save to save your settings and close the Select Counters dialog box.
When the analysis is complete, Streamline displays charts for the SPE counters in the Timeline view. It also adds SPE data to the Call Paths, Functions, and Code views.
Examine the data at the thread, function, source line, or instruction level in the Call Paths, Functions, and Code views.