Streamline Cache Test example - Arm Development Suite
This example illustrates the use of Streamline to reveal cache efficiency by way of simple instrumented C code.
Purpose and scope
This document describes how to build, run and analyze the Streamline cache test example provided with Arm DS IDE.
This example illustrates the use of Streamline to reveal cache efficiency by way of simple instrumented C code.
Terminology
This document refers to 'host' and 'target' systems. The 'host' system is the Linux or Windows desktop computer that you use for most of your work.
The 'target' system is some Arm-based hardware (or model of such hardware) on which the example Arm Linux distribution is running.
This example is intended to be run and analyzed on an Arm Linux hardware target that supports Streamline, though it can also be run only (not analyzed) on a software model such as the FVP model.
Hardware and software requirements
A host workstation (Linux or Windows) is required to build the example, communicate with the target, and run Arm DS IDE Debugger.
An Arm Linux hardware target, configured for Streamline (with gator driver and daemon).
A serial terminal emulator such as the Terminal view in Arm DS IDE, TeraTerm for Windows (
TeraTerm Website) or minicom for Linux, connected via a serial cable to your platform. This may be needed when running the example on real target hardware, to monitor the Arm Linux boot process and provide a terminal interface to Arm Linux. This is not needed if using a software model as that provides its own terminal. To open the Terminal view in Arm DS IDE, select to open the Show View dialog box, then expand the
Terminal group and select
Terminal, and click
OK. To configure the terminal settings, click on
Settings in the
Terminal view, then select the required connection type (for example, Serial), then enter the required settings (for example, 38400 baud, 8 bits, no parity, 1 stop bit) and click
OK.
A secure copy program such as
scp to allow files to be copied from host to target.
Windows versions of this Linux command are available, such as
pscp as provided with
PuTTY.
Alternatively, use Arm DS IDE Debugger Debug Configurations dialog or Remote System Explorer to transfer files from host to target.
Exploring the Example
This project includes a simple C file cache-test.c, Streamline annotation C source and header files streamline_annotate.c/.h, Makefile, debug launcher Streamline_cache_test-gdbserver.launch, pre-built stripped and unstripped versions of the executable cache-test, and a ready-made Streamline capture.
Open cache-test.c to view the test code. Its main() function first allocates some arrays, then fills them with some data. main() then calls xy_loop() to perform a sum in row-major ordering, then calls yx_loop() to perform a sum in column-major ordering. When the cache_test application is run (see below), it prints into the App Console of Arm DS IDE Debugger similar to:
Starting loop addition with 5000 iterations...
row-major ordering: took 157.22 ms
col-major ordering: took 1373.94 ms (8.7x slower)
The Streamline capture will reveal that in this case, the row-major ordering make much more efficient use of the cache than the column-major ordering.
The ready-made Streamline capture for this example can be viewed in Streamline using the Import Streamline Sample Captures wizard.
To view the ready-made Streamline capture, first launch the Streamline application, then import the capture for this example
by selecting , Import Streamline Sample Captures,
then selecting from the list of sample captures.
The capture is added to the Streamline Data view. Double-click on the capture to generate and view an Analysis Report.
To explore the Analysis Report for the example:
Click on the Timeline tab to show the timeline. Notice there is a burst of cache activity between 0.6s and 2.1s. Expand the cache display to reveal on which CPU this occurs.
Click on the [cache_test] process to view the text annotations ("Array allocation", "row-major ordering", "col-major ordering" ). Notice the burst of cache activity corresponds to the column-major ordering phase.
Click on the Functions tab, and notice that much more time is spent in yx_loop() than in xy_loop(). Click on yx_loop() to open a view of its code, the disassembly, and the percentage of samples in that area. Notice the LDR instructions, especially the last one, account for most of the time in this function, because of the cache misses that are occurring.
Click on the Log tab to list every message generated by the ANNOTATE statements in the code along with information related to the message.
The master sources for the Streamline annotation C source and header files streamline_annotate.c/.h are provided in install_directory\sw\streamline\gator\annotate\.
Building this example
A pre-built executable is provided for aarch64 linux systems.
This example is intended to be built with aarch64-none-linux-gnu GCC. If you wish to modify and rebuild the example, you must have a suitable GCC installed.
Building on the command-line
To build on the command-line with the supplied make utility:
On Windows, open a Arm DS IDE Command Prompt from the Start menu, run the select_toolchain utility, and select GCC 4.8.3 [arm-linux-gnueabihf] from the list
On Linux, run the suite_exec utility with the --toolchain option to select the compiler and start a shell configured for the suite environment, for example: ~/Arm_ds/bin/suite_exec --toolchain "GCC 4.8.3 [arm-linux-gnueabihf]" bash
Then navigate to ...\Streamline_cache_test then type:
make
The usual make rules: clean, all and rebuild are provided in the Makefile.
Building with Eclipse
In the Project Explorer view, select the project you want to build.
Select .
The supplied Streamline_cache_test Eclipse (makefile builder) project is used to build this example.
Running/Debugging the example on a hardware target
This example can be run and debugged on a hardware target by using the supplied Streamline_cache_test-gdbserver debug configuration.
If you have not done so already, boot Arm Linux on your target and log-in as root.
Create a Linux/ssh Remote System Explorer connection for your target
Select , then selecting the
Streamline_cache_test-gdbserver under the Arm DS IDE Debugger configuration type, and press .
This is pre-configured to download the stripped cache_test executable to the target, start gdbserver on the target, load the debug information from the debug/unstripped version of the image into Arm DS IDE Debugger, then start executing cache_test, stopping at main().
Loading the example onto the hardware target manually
Instead of using the supplied Streamline_cache_test-gdbserver debug configuration, the example stripped executable cache_test can be manually downloaded onto the target, before the example can be run/debugged.
To load this file onto the target, you can either:
-
use Remote System Explorer to drag and drop this file from the host to target file system, then set execute permissions on the copied executable on the target with Remote System Explorer's option
-
perform the manual copy steps as follows:
-
On the target, obtain the IP address of the target with:
ifconfig
to give, for example, 10.1.204.172
-
To load the example onto the target, navigate to the ...\Streamline_cache_test\stripped directory, then
copy the generated stripped files from the host to your home directory on the target.
If logged-in as root, copy to a writable directory in its home directory with, for example, these commands on a Linux host:
scp cache_test root@10.1.204.172:/writeable
Windows users might need to use pscp from PuTTY instead of scp.
-
Set execute permissions on the copied executable on the target with chmod +x cache_test on the target console
Running the example manually
Instead of using the supplied Streamline_cache_test-gdbserver debug configuration, cache_test can be run directly from the target's command-line.
First, navigate to the directory on the target where cache_test is located, then execute the following command on the target:
./cache_test
Preparing to debug the example with gdbserver manually
Instead of using the supplied Streamline_cache_test-gdbserver debug configurations, you can prepare to debug cache_test with gdbserver manually.
First, navigate to the directory on the target where cache_test is located, then execute the following command on the target:
gdbserver :5000 ./cache_test &
Capturing data and annotations
Assuming the stripped cache_test executable has already been loaded to the target, it can now be analyzed with Streamline.
-
First prepare the Streamline capture configuration in the Streamline Data view:
Click on Capture & analysis options and enter a name for the capture session.
In the Address field, enter the host name or IP address of the target.
Select an Output path to where to save the data.
Add the name of the program image to capture - for this example, Streamline must load the debug information from the debug/unstripped
version of the image at ${workspace_loc}\Streamline_cache_test\cache_test.
Click Save to return to Streamline Data.
-
Press Start capture. If the capture configuration is correct, a new analysis file appears in the
Streamline Data, with a Stop button within it.
-
Start the example from the command line with: ./cache_test.
-
After the example finishes, press Stop in Streamline Data.
The collected data is then processed for viewing.
-
When the analysis completes, Streamline automatically opens the Analysis Report.
To analyze the captured data again with other settings, click on the Options gear-wheel icon on the right-hand side of the Streamline_cache_test Capture Data.
Known issues and troubleshooting
-
The Linux target will report: # Cannot exec ./cache_test: Permission denied.
if you have not set execute permissions on the application. Use, for example, chmod +x cache_test.
-
The Linux target will report: Connection Failed: Failed to delete file /writable/(app): Permission denied.
if you don't have permissions to write in the root folder. Modify your Debug Configuration, and set, in Tab, a writeable folder in Target download directory:.
-
Breakpoints are not being hit in an application or shared library:
Ensure the application and any shared libraries on your target match the ones on the host.
The code/data layout must be identical, though the application/shared library on your target does not need to contain debug symbols, i.e. these can be stripped to reduce image size.
Try copying the application/shared library across from host to target again.
See also:
Copyright© 2010-2022 Arm Limited (or its affiliates). All rights reserved.