Streamline helps you optimize software for devices that use Arm® processors.
Evaluate where the software in your system spends most of its time by capturing a performance profile of your application running on a target device. Quickly determine whether your performance bottleneck relates to the CPU processing or GPU rendering using interactive charts and comprehensive data visualizations.
For CPU bottlenecks, use the native profiling functionality to locate specific problem areas in your application code. Investigate how processes, threads, and functions behave, from high-level views, right down to line-by-line source code analysis. The basic profile is based on regular sampling of the PC (Program Counter) of the running threads, allowing identification of the hotspots in the running application. Hardware performance counters that are provided by the target processors can supplement this analysis. These counters enable hotspot analysis to include knowledge of hardware events such as cache misses and branch mispredictions.
For GPU bottlenecks, use performance data from the Arm® Mali™ GPU driver and hardware performance counters to explore the rendering workload efficiency. Visualize the workload breakdown, pipeline loading, and execution characteristics to quickly identify where to apply rendering optimizations.
With Streamline, you can:
Find hot spots in your code to be targeted for software optimization.
Identify the processor that is the major bottleneck in the performance of your application.
Use CPU performance counters to provide insights into L1 and L2 cache efficiency, enabling cache-aware profiling.
Identify the cause of heavy rendering loads which cause poor GPU performance, and use GPU performance counters to identify workload inefficiencies.
Reduce device power consumption and improve energy efficiency by optimizing workloads using performance counters from the CPU, GPU, and memory system.
Streamline supports data capture on Android devices. Streamline collects CPU performance data and Arm Mali GPU, or Arm Immortalis™ GPU, performance data so that you can profile your debuggable game or application without device modification. Streamline also supports non-debuggable application profiling on a rooted device. To configure Streamline to collect the right data, use the templates to select the most appropriate set of counters for your device. Alternatively use the built-in templates as a starting point for a customized data visualization. See Profile your Android application.
To learn how to set up your Android target for profiling with Streamline, see the Arm Streamline Target Setup Guide for Android
In addition to the single application profiling for non-root devices, Streamline supports system-wide Android profiling when running on development devices with root access. System profiling enables manufacturers to simultaneously monitor all applications and services running on their device, allowing identification of problematic processes or scheduling behaviors.
Streamline supports system-wide profiling of applications running on Linux-based embedded devices. Analyze the behavior of the system hardware by selecting the required Arm CPU or Mali GPU hardware performance counters for your scenario. This analysis can be supplemented by connecting power measurement probes, such as the Arm Energy Probe or National Instruments DAQ, to provide accurate measurement of system energy use. To provide more context to the analysis, you can use software annotations in Arm software libraries, such as the Mali GPU OpenCL device driver. Profile your Linux application
To learn how to set up your Linux target for profiling with Streamline, see the Arm Streamline Target Setup Guide for Linux
Streamline can profile bare-metal software running on Arm processors, emitting data over an Arm® CoreSight™ ITM, or STM data channel. It can also profile data captured to an on-device memory buffer. Support is included for PC sampling, performance counter sampling, and application-generated annotations.
The target agent for bare-metal profiling is provided as a small source library that is integrated directly into the bare-metal application that is being profiled. This library - called Barman - is auto-generated based on the data channel configuration that is required. It provides optional hooks that allow for lightweight RTOS integration, such as annotation of thread or task context switches. This integration can supplement the basic performance information in the data visualizations. Profile your bare-metal application
To learn how to set up your bare-metal target for profiling with Streamline, see the Arm Streamline Target Setup Guide for Bare-metal Applications