The Arm System Characterization Tool (ASCT) is a standalone command-line utility for running low-level benchmarks, diagnostic scripts, and system tests to analyze and debug performance on Arm-based platforms.
ASCT provides a standardized environment for evaluating key hardware characteristics, such as memory latency and bandwidth, and is especially suited for platform bring-up, system tuning, and architectural comparison tasks. It helps developers and system architects gain early and repeatable insights into performance-critical subsystems.
Current capabilities include:
Planned features include:
Download the latest asct-<version>.tar.gz
release from the Artifactory
Releases page.
Install ASCT using uv:
UV_TOOL_BIN_DIR=/usr/local/bin sudo -E $(which uv) tool install /path/to/asct-X.Y.Z.tar.gz[!IMPORTANT] Install ASCT into
/usr/local/bininstead of the default~/.local/binso you can run it withsudo. You can install and use the tool withoutsudo, but some functionality might be limited.
Run ASCT:
sudo asct runSee Getting started or run the
asct help to learn how to use ASCT.
Remove ASCT using uv:
sudo -E $(which uv) tool uninstall asct
ASCT contains a number of separate commands with the following common pattern:
asct [command] [arguments]
| Command | Description |
|---|---|
run |
Runs a list of benchmarks based on a provided list of keywords |
system-info |
Display system information only and quit. ASCT also includes this information by default with every benchmark run. |
list |
Get a list of available benchmarks |
version |
Display the version of ASCT |
help |
Display the help and available options for any command |
For more information on a specific command, run:
asct [command] --help
To get started, run the default set of benchmarks:
sudo asct run
[!IMPORTANT] Some benchmarks require
sudoor root privileges to configure huge pages and access certain system information. ASCT can run withoutsudo, but some benchmarks might be unavailable or limited in functionality.
The run command allows you to run one or more
benchmarks.
sudo asct run
By default, using the run command with no arguments
executes a default set of benchmarks and displays the results in the
terminal with stdout. Each time you run the
run command, ASCT collects system information (just like
system-info), and displays or stores it with the benchmark
results.
Depending on the benchmark, ASCT might also generate additional
output, such as graphs or raw data dumps, and save in a directory within
the current working directory. By default this directory is named
data.<YYYYMMDD_HHMMSS_microseconds>, but you can
customize it using the --output-dir flag.
To run all available benchmarks:
sudo asct run all
To run specific benchmarks, pass a list of benchmark names as
arguments to the run command.
For example, to run the latency-sweep and
idle-latency benchmarks:
sudo asct run latency-sweep idle-latency
Alternatively, each benchmark also has associated keywords that you can use to select multiple benchmarks that match those keywords.
For example, to run all benchmarks tagged with the
memory keyword:
sudo asct run memory
You can exclude benchmarks by negating their names or keywords. To do
this, prepend the name or keyword with a caret (^).
For example, to run all benchmarks but exclude all of those tagged with the bandwidth keyword:
sudo asct run all ^bandwidth
[!NOTE] If a benchmark is a dependency for running another benchmark, negating or excluding it will have no effect. For example, if you use ^latency-sweep to exclude the latency-sweep benchmark, but another benchmark depends on latency-sweep, the latency-sweep benchmark will still run.
To display the list of available benchmarks and their keywords, run the following command:
asct list
Run also supports the following arguments:
--format, -f
[stdout, csv, json]
Specify the output format, either to the terminal with
stdout (default), individual CSV files
(benchmark-name.csv), or a single combined JSON file
(report.json)
For example:
asct run idle-latency --format=json
--log-level, -L
[debug,info,warning,error,critical]
info by default, but
you can configure it.--log-file LOG_FILE
--log-level parameter. ASCT writes logs to this file and
also prints them to standard error stderr.--output-dir, -o OUTPUT_DIR
data.<YYYYMMDD_HHMMSS_microseconds> within the
current working directory. Use this flag to override the default and
specify a custom output directory, using either an absolute or a
relative path.--force
--output-dir
already exists, ASCT displays an error and quits to avoid overwriting
data. Use the --force flag to overwrite the output
directory if it exists.--quiet, -q
stdout and stderr,
including critical errors and log messages. Use --log-file
to capture and view logs.--no-progress-bar
Display system information only. This information is also collected and saved by default during any benchmark run, but you can run this command on its own to quickly view the system configuration.
sudo asct system-info
[!NOTE] Some system information requires
sudoor root privileges. You can run thesystem-infocommand withoutsudo, but some details might be unavailable.
You can use the same arguments as run described above to
configure the output format, output directory, and verbosity of the
tool.
For example:
sudo asct system-info --format=json --output-dir=data --force --log-level=error --log-file=data/asct.log --quiet
Run the following command to get the list of available benchmarks and their associated keywords:
asct list
Use this benchmark to measure memory latency across data sizes from 128 bytes to 1 GiB. The results include the average access latency for each size.
The results show how memory access latency shifts across cache levels and dynamic random-access memory (DRAM), revealing transitions in the cache hierarchy.
Randomized linked lists prevent hardware prefetching from influencing the latency measurements.
The benchmark uses 1 GiB huge pages (also known as large pages on some systems) to reduce the impact of page table walks and translation lookaside buffer (TLB) lookups on latency measurements.
The benchmark calculates:
Lower and upper bounds for each cache level
Average latencies
The optimal data size for L1, L2, last-level cache (LLC), and DRAM. Other memory benchmarks like Idle Latency and Loaded Latency reuse these sizes to improve precision.
Latencies at different levels of cache
--------------------------------------
Level Lower bound Upper bound Optimum data size Latency [ns]
L1 128 64K 32.0625K 1.5
L2 64K 512K 288K 5.2
LLC 1M 8M 4.5M 48.1
DRAM 32M 1G 528M 107.6
latency-sweep.png in the output directory.
Use this benchmark to measure memory access latency from the last core on each node to its local and remote memory in a non-uniform memory access (NUMA) system. In NUMA systems, memory access times vary by location.
To characterize the system accurately, ensure it is idle. Close all applications and background processes except the test itself. Arm System Characterization Tool (ASCT) imposes only the minimal load needed for measurement, but it cannot control other background activity that might affect test results.
The benchmark produces a matrix of size n by n, with n equal to the number of NUMA nodes.
Latencies of random memory access at idle (in nanoseconds)
----------------------------------------------------------
Node 0 Node 1
Node 0 113.3 266.9
Node 1 266.5 115.7
Note: ASCT derives the data size used to target DRAM
from the latency-sweep benchmark. If not selected manually,
ASCT automatically includes latency-sweep as a
dependency.
Use this benchmark to measure the memory latency of the last core on the first NUMA node to its local memory, while other cores generate increasing memory traffic.
To vary memory pressure, ASCT interleaves memory reads with different numbers of no-operation (NOP) instructions.
Loaded latency with background memory activity
------------------------------------------------
Injected NOPs Loaded latency [ns] Bandwidth [GB/s]
3000 115.5 10.7
900 115.8 35.2
500 117.6 61.4
180 122.2 175.3
100 129.2 306.5
80 134.1 385.0
70 138.3 441.8
50 158.8 603.9
40 193.9 753.7
30 250.7 912.7
20 266.5 918.3
10 281.0 918.3
0 305.4 920.7
Note: ASCT derives the data size used to target DRAM
from the latency-sweep benchmark. If not selected manually,
ASCT automatically includes latency-sweep as a
dependency.
Use this benchmark to measure the latency of cache line transfers between pairs of CPU cores across the system.
The benchmark uses a ping-pong microbenchmark that alternates shared memory access and modification between 2 threads pinned to different cores.
The shared memory region is a randomized linked list. The benchmark pointer-chases this list so that each memory access depends on the result of the previous one. This approach prevents hardware prefetchers from interfering with results.
The benchmark binds the memory region to a specific NUMA node, so you can measure both local core-to-core latency on the same node and remote core-to-core latency across nodes. This distinction helps you understand the impact of NUMA topology on inter-core communication.
The benchmark calculates:
Node-to-node median latency matrices that summarize inter-node and intra-node communication costs.
Highest-latency core pairs that highlight outliers and potential bottlenecks.
Asymmetry in latency between different directions, for example A to B versus B to A.
The benchmark presents results as:
Tables that show node-to-node and core-to-core latencies.
Heatmaps that show latency patterns across all core pairs.
CSV and JSON files that support further analysis.
Core-to-Core Latency Summary (ns): Data Address @ Local Numa Node
=================================================================
Node-to-Node Median Latency Matrix (ns):
----------------------------------------
Node0 Node1
Node0 31.91 152.86
Node1 153.10 32.00
Latency Statistics (ns):
------------------------
Min : 23.81
Max : 161.65
Mean : 92.87
Median : 140.66
Top Latency Core Pairs with Median Latency
------------------------------------------
CPUA CPUB Latency (ns)
---------------------------
186 91 161.65
187 83 161.37
95 186 161.29
91 190 161.12
178 90 161.02
186 83 160.50
82 157 160.44
83 176 160.42
90 187 160.42
90 159 160.41
Node-to-Node Latency Statistics (ns):
-------------------------------------
Node0 → Node0:
Min: 24.68 ns
Max: 42.24 ns
Mean: 32.16 ns
Median: 31.91 ns
Node0 → Node1:
Min: 112.43 ns
Max: 161.29 ns
Mean: 152.67 ns
Median: 152.86 ns
Node1 → Node0:
Min: 141.72 ns
Max: 161.65 ns
Mean: 153.14 ns
Median: 153.10 ns
Node1 → Node1:
Min: 23.81 ns
Max: 41.44 ns
Mean: 32.26 ns
Median: 32.00 ns
Use this benchmark to measure latency between 2 CPU cores by forcing cache line transfers between them.
The benchmark pins 2 threads on the target cores.
Each thread alternates between accessing and modifying the shared data structure. During its turn, a thread performs a complete pointer chase on the structure.
The benchmark uses cache invalidation and data dependency chains to estimate latency.
When one thread writes to the shared data structure, the corresponding cache line in the other core is evicted.
Bandwidth at different levels of cache
--------------------------------------
Data size used Level Bandwidth [GB/s]
32.0625K L1 126.5
288K L2 74.5
4.5M LLC 35.1
528M DRAM 15.8
bandwidth.png in the output directory.
Note: ASCT derives the data size used to target each
cache level from the results of the latency-sweep
benchmark. If not selected manually, ASCT automatically includes
latency-sweep as a dependency.
This benchmark measures the maximum aggregate memory bandwidth achieved by all cores in one NUMA node when accessing either their local NUMA memory, or memory from a remote NUMA node.
The benchmark produces an n × n matrix, where n is the number of NUMA nodes.
Cross-NUMA bandwidths for the system (in GB/s)
----------------------------------------------
Node 0 Node 1
Node 0 459.1 78.3
Node 1 78.8 459.2
Note: ASCT derives the data size used to target DRAM
from the latency-sweep benchmark. If not selected manually,
ASCT automatically includes latency-sweep as a
dependency.
This benchmark makes full use of all cores on all NUMA nodes to measure the maximum achievable memory bandwidth of the system.
To examine how different memory access patterns affect maximum usage, the benchmark tests multiple traffic types, for example, all reads or mixed reads-writes ratios. Each pattern yields a corresponding peak bandwidth value.
If available, compare these results with the theoretical peak bandwidth reported in the system information output.
Peak memory bandwidth
---------------------
Traffic type Peak BW [GB/s]
All Reads 918.9
3:1 Reads-Writes 868.8
2:1 Reads-Writes 859.5
1:1 Reads-Writes 842.7
2:1 Rd-Wr (Non-Temporal) 652.1
Note: ASCT derives the data size used to target DRAM
from the latency-sweep benchmark. If not selected manually,
ASCT automatically includes latency-sweep as a
dependency.