Arm System Characterization Tool

The Arm System Characterization Tool (ASCT) is a standalone command-line utility for running low-level benchmarks, diagnostic scripts, and system tests to analyze and debug performance on Arm-based platforms.

ASCT provides a standardized environment for evaluating key hardware characteristics, such as memory latency and bandwidth, and is especially suited for platform bring-up, system tuning, and architectural comparison tasks. It helps developers and system architects gain early and repeatable insights into performance-critical subsystems.

Current capabilities include:

Planned features include:

Documentation

Prerequisites

Install ASCT

  1. Download the latest asct-<version>.tar.gz release from the Artifactory Releases page.

  2. Install ASCT using uv:

    UV_TOOL_BIN_DIR=/usr/local/bin sudo -E $(which uv) tool install /path/to/asct-X.Y.Z.tar.gz

[!IMPORTANT] Install ASCT into /usr/local/bin instead of the default ~/.local/bin so you can run it with sudo. You can install and use the tool without sudo, but some functionality might be limited.

  1. Run ASCT:

    sudo asct run
  2. See Getting started or run the asct help to learn how to use ASCT.

Uninstall ASCT

Remove ASCT using uv:

sudo -E $(which uv) tool uninstall asct

Getting started

ASCT contains a number of separate commands with the following common pattern:

asct [command] [arguments]
Command Description
run Runs a list of benchmarks based on a provided list of keywords
system-info Display system information only and quit. ASCT also includes this information by default with every benchmark run.
list Get a list of available benchmarks
version Display the version of ASCT
help Display the help and available options for any command

For more information on a specific command, run:

asct [command] --help

To get started, run the default set of benchmarks:

sudo asct run

[!IMPORTANT] Some benchmarks require sudo or root privileges to configure huge pages and access certain system information. ASCT can run without sudo, but some benchmarks might be unavailable or limited in functionality.

run command

The run command allows you to run one or more benchmarks.

Default behavior

sudo asct run

By default, using the run command with no arguments executes a default set of benchmarks and displays the results in the terminal with stdout. Each time you run the run command, ASCT collects system information (just like system-info), and displays or stores it with the benchmark results.

Depending on the benchmark, ASCT might also generate additional output, such as graphs or raw data dumps, and save in a directory within the current working directory. By default this directory is named data.<YYYYMMDD_HHMMSS_microseconds>, but you can customize it using the --output-dir flag.

Select benchmarks

To run all available benchmarks:

sudo asct run all

To run specific benchmarks, pass a list of benchmark names as arguments to the run command.

For example, to run the latency-sweep and idle-latency benchmarks:

sudo asct run latency-sweep idle-latency

Alternatively, each benchmark also has associated keywords that you can use to select multiple benchmarks that match those keywords.

For example, to run all benchmarks tagged with the memory keyword:

sudo asct run memory

You can exclude benchmarks by negating their names or keywords. To do this, prepend the name or keyword with a caret (^).

For example, to run all benchmarks but exclude all of those tagged with the bandwidth keyword:

sudo asct run all ^bandwidth

[!NOTE] If a benchmark is a dependency for running another benchmark, negating or excluding it will have no effect. For example, if you use ^latency-sweep to exclude the latency-sweep benchmark, but another benchmark depends on latency-sweep, the latency-sweep benchmark will still run.

To display the list of available benchmarks and their keywords, run the following command:

asct list

Output options

Run also supports the following arguments:

system-info command

Display system information only. This information is also collected and saved by default during any benchmark run, but you can run this command on its own to quickly view the system configuration.

sudo asct system-info

[!NOTE] Some system information requires sudo or root privileges. You can run the system-info command without sudo, but some details might be unavailable.

You can use the same arguments as run described above to configure the output format, output directory, and verbosity of the tool.

For example:

sudo asct system-info --format=json --output-dir=data --force --log-level=error --log-file=data/asct.log --quiet

list command

Run the following command to get the list of available benchmarks and their associated keywords:

asct list

Memory characterization

Latency

Latency sweep

    Latencies at different levels of cache
    --------------------------------------
    Level Lower bound Upper bound Optimum data size Latency [ns]
    L1           128         64K         32.0625K          1.5
    L2           64K        512K             288K          5.2
    LLC           1M          8M             4.5M         48.1
    DRAM         32M          1G             528M        107.6
Latency sweep line chart of average memory latency against data size. Latency stays low (~1.3 ns) up to 32 KiB, then increases in steps at 128 KiB, 1 MiB, and 16 MiB, reaching ~120 ns at 64 MiB and above.

Idle latency

  Latencies of random memory access at idle (in nanoseconds)
  ----------------------------------------------------------
          Node 0 Node 1
  Node 0  113.3  266.9
  Node 1  266.5  115.7

Note: ASCT derives the data size used to target DRAM from the latency-sweep benchmark. If not selected manually, ASCT automatically includes latency-sweep as a dependency.

Loaded latency

    Loaded latency with background memory activity
    ------------------------------------------------
     Injected NOPs  Loaded latency [ns]  Bandwidth [GB/s]
              3000                115.5              10.7
               900                115.8              35.2
               500                117.6              61.4
               180                122.2             175.3
               100                129.2             306.5
                80                134.1             385.0
                70                138.3             441.8
                50                158.8             603.9
                40                193.9             753.7
                30                250.7             912.7
                20                266.5             918.3
                10                281.0             918.3
                 0                305.4             920.7

Note: ASCT derives the data size used to target DRAM from the latency-sweep benchmark. If not selected manually, ASCT automatically includes latency-sweep as a dependency.

Core-to-core latency

    Core-to-Core Latency Summary (ns): Data Address @ Local Numa Node
    =================================================================
    Node-to-Node Median Latency Matrix (ns):
    ----------------------------------------
            Node0   Node1
    Node0   31.91  152.86
    Node1  153.10   32.00

    Latency Statistics (ns):
    ------------------------
    Min       : 23.81
    Max       : 161.65
    Mean      : 92.87
    Median    : 140.66

    Top Latency Core Pairs with Median Latency
    ------------------------------------------
    CPUA   CPUB    Latency (ns)
    ---------------------------
    186       91         161.65
    187       83         161.37
     95      186         161.29
     91      190         161.12
    178       90         161.02
    186       83         160.50
     82      157         160.44
     83      176         160.42
     90      187         160.42
     90      159         160.41

    Node-to-Node Latency Statistics (ns):
    -------------------------------------

    Node0 → Node0:
    Min:    24.68 ns
    Max:    42.24 ns
    Mean:   32.16 ns
    Median: 31.91 ns

    Node0 → Node1:
    Min:    112.43 ns
    Max:    161.29 ns
    Mean:   152.67 ns
    Median: 152.86 ns

    Node1 → Node0:
    Min:    141.72 ns
    Max:    161.65 ns
    Mean:   153.14 ns
    Median: 153.10 ns

    Node1 → Node1:
    Min:    23.81 ns
    Max:    41.44 ns
    Mean:   32.26 ns
    Median: 32.00 ns
Square core-to-core latency heatmap with green diagonal quadrants (low latency within clusters) and orange off-diagonal quadrants (higher latency between clusters). A vertical color scale indicates latency from green (low) to red (high).

Ping-pong microbenchmark

A diagram titled Inter-Core Ping-Pong Synchronization showing communication between two threads on separate CPUs. Thread A on CPU_1 waits for a spin signal from Thread B on CPU_2, then follows a pointer chain with a write operation, and finally signals Thread B to proceed. A note below explains that threads A and B alternate modifying and reading cache lines, with writes on one causing cache invalidation on the other.

NUMA and memory binding

Bandwidth

Bandwidth sweep

    Bandwidth at different levels of cache
    --------------------------------------
    Data size used Level Bandwidth [GB/s]
         32.0625K    L1            126.5
             288K    L2             74.5
             4.5M   LLC             35.1
             528M  DRAM             15.8
Bandwidth sweep chart showing memory bandwidth (in GiB/s) vs. data sizes. Bandwidth remains near peak (~149 GiB/s) for small sizes, then drops sharply around 128 KiB, and continues declining gradually as data size increases

Note: ASCT derives the data size used to target each cache level from the results of the latency-sweep benchmark. If not selected manually, ASCT automatically includes latency-sweep as a dependency.

Cross-NUMA bandwidth

    Cross-NUMA bandwidths for the system (in GB/s)
    ----------------------------------------------
           Node 0 Node 1
    Node 0  459.1   78.3
    Node 1   78.8  459.2

Note: ASCT derives the data size used to target DRAM from the latency-sweep benchmark. If not selected manually, ASCT automatically includes latency-sweep as a dependency.

Peak bandwidth

    Peak memory bandwidth
    ---------------------
                Traffic type  Peak BW [GB/s]
                   All Reads           918.9
            3:1 Reads-Writes           868.8
            2:1 Reads-Writes           859.5
            1:1 Reads-Writes           842.7
    2:1 Rd-Wr (Non-Temporal)           652.1

Note: ASCT derives the data size used to target DRAM from the latency-sweep benchmark. If not selected manually, ASCT automatically includes latency-sweep as a dependency.

Characterize storage using ASCT

This section describes the storage benchmarks in the Arm System Characterization Tool (ASCT). These tests evaluate performance characteristics of block devices and filesystems using fio under controlled sweep parameters.

What ASCT storage benchmarking measures

ASCT evaluates storage performance by sweeping parameters and measuring their impact on input/output (I/O) performance. - Sustained bandwidth - Input/Output Operations Per Second (IOPS) and latency - Host CPU utilization during I/O

CPU utilization is reported as a breakdown of time spent in user space, kernel code, I/O wait, and hardware and software interrupt handling. This information helps identify whether performance is limited by the application, the kernel, or the storage device.

Sweeps included

ASCT automates multiple types of parameter sweeps, each designed to isolate one dimension of I/O behavior: - Request Size Sweep (4 KiB → 128 KiB) Reveals throughput scaling as well as controller or device saturation points - I/O Queue Depth Sweep (QD=1 → QD=128, powers of two) Capture parallelism gains and concurrent access behavior - Concurrent Process Count Sweep (1 → 16 concurrent processes, powers of two) Show scaling across CPU cores - Access Pattern Sweep Sequential vs random, and read/write/mixed ratios (default: 70% reads). Demonstrates workload-specific differences. Compare sequential and random I/O, as well as read-only, write-only, and mixed workloads (default 70% reads). Use this sweep to show how performance varies by workload. ## How ASCT runs parameter sweeps

ASCT uses fio to perform controlled parameter sweeps. For each sweep, ASCT changes one parameter at a time while keeping all other parameters fixed. It then repeats the sweep for each configured parameter set.

Sweep type Example values
Request size 4K, 8K, 16K, 32K, 64K, 128K
Queue depth 1, 2, 4, 8, 16, 32, 64, 128
Jobs 1, 2, 4, 8, 16
Access pattern seq read, seq write, rand read, rand write, seq mixed, rand mixed

For each sweep point, ASCT records the following key metrics: - From fio: bandwidth, I/O rate, average latency, latency distribution - From mpstat: CPU utilization breakdowns (user/system/iowait/hard irq/soft irq)

Run storage benchmarking

Use the run command with storage keywords and optional --user-config parameters to select a device or create a temporary file.

# Run a specific sweep (in order of Request Size Sweep, I/O Depth Sweep, Concurrent Process Count Sweep, Access Pattern Sweep)
sudo asct run storage-request-size-sweep ...
sudo asct run storage-io-depth-sweep ...
sudo asct run storage-process-count-sweep ...
sudo asct run storage-access-pattern-sweep ...

Alternatively, use the short-form alias:

# Run a specific sweep (short form in order of Request Size Sweep, I/O Depth Sweep, Concurrent Process Count Sweep, Access Pattern Sweep)
sudo asct run srss ...
sudo asct run sids ...
sudo asct run spcs ...
sudo asct run saps ...

ASCT requires a target device or file. CAUTION: tests might overwrite file content.

# Provide a target file (ASCT will use /tmp/mytest.dat for test, user needs to create file upfront)
sudo asct run srss --user-config srss.file_names=/tmp/mytest.dat
# Provide a target device (ASCT will use the device /dev/nvme07, user needs to ensure device exists)
sudo asct run srss --user-config srss.file_names=/dev/nvme07

ASCT exposes several user overrides, including the following examples: - Create a temporary file for file benchmarking so that you do not need to provide the device name:

# Auto-create temporary file for file I/O
sudo asct run srss --user-config srss.create_temp_file=1
# Change read write mix ratio for access pattern sweep to use 20% read instead of default of 70%
sudo asct run saps --user-config saps.rwmixread=20
# Use linear iodepth sweep step between 4 and 8 for more details
sudo asct run sids --user-config sids.iodepth_sweep_steps=4,5,6,7,8

See the ASTC help for a full list of user overrides.

Outputs generated

Storage benchmarking in ASCT produces output similar to memory benchmarking:

Per-sweep sample outputs


Request Size Sweep

This sweep varies the I/O request size while keeping the rest of parameter constant. It highlights how throughput scales with larger transfers and shows when the device reaches saturation.

Console summary

BlockSize Read BW (MB/s) Write BW (MB/s) Total BW (MB/s) Read Thruput (kops) Write Thruput (kops) Thruput (kops) Read Lat. (us) Write Lat. (us) Lat. (us) CPU usr (%) CPU sys (%) CPU iowait (%)
4K 11.9 0.0 11.9 3.0 0.0 3.0 1311.7 0.0 1311.7 0.1 0.1 99.8
8K 23.8 0.0 23.8 3.0 0.0 3.0 1311.3 0.0 1311.3 0.1 0.2 99.6
16K 47.7 0.0 47.7 3.0 0.0 3.0 1311.3 0.0 1311.3 0.1 0.2 99.5
32K 95.4 0.0 95.4 3.1 0.0 3.1 1310.1 0.0 1310.1 0.1 0.3 99.6
64K 127.1 0.0 127.1 2.0 0.0 2.0 1965.9 0.0 1965.9 0.1 0.2 99.7
128K 127.2 0.0 127.2 1.0 0.0 1.0 3932.0 0.0 3932.0 0.0 0.2 99.8

Plots for details

Bandwidth I/O Rate
CPU Utilization Read Latency CDF

I/O Depth Sweep

This sweep varies the I/O queue depth (the number of outstanding requests) while keeping the request size and process count constant.
It measures how effectively the device exploits parallelism.

Console summary

IODepth Read BW (MB/s) Write BW (MB/s) Total BW (MB/s) Read Thruput (kops) Write Thruput (kops) Thruput (kops) Read Lat. (us) Write Lat. (us) Lat. (us) CPU usr (%) CPU sys (%) CPU iowait (%)
1 11.9 0.0 11.9 3.0 0.0 3.0 1311.8 0.0 1311.8 0.1 0.1 99.7
2 11.9 0.0 11.9 3.0 0.0 3.0 1311.3 0.0 1311.3 0.1 0.1 99.7
4 11.9 0.0 11.9 3.0 0.0 3.0 1311.3 0.0 1311.3 0.4 0.2 98.4
8 11.9 0.0 11.9 3.0 0.0 3.0 1311.4 0.0 1311.4 0.5 0.2 98.5
16 11.9 0.0 11.9 3.0 0.0 3.0 1311.4 0.0 1311.4 0.1 0.1 99.8
32 11.9 0.0 11.9 3.0 0.0 3.0 1311.3 0.0 1311.3 0.1 0.1 99.8
64 11.9 0.0 11.9 3.0 0.0 3.0 1311.3 0.0 1311.3 0.1 0.1 99.7
128 11.9 0.0 11.9 3.0 0.0 3.0 1311.4 0.0 1311.4 0.7 0.2 98.7

Plots for details

Bandwidth I/O Rate
CPU Utilization Read Latency CDF

Concurrent Process Count Sweep

This sweep varies the number of concurrent processes issuing I/O.
It evaluates scaling across CPU cores and submission queues, and highlights the point at which performance no longer improves.

Console summary

ProcessCount Read BW (MB/s) Write BW (MB/s) Total BW (MB/s) Read Thruput (kops) Write Thruput (kops) Thruput (kops) Read Lat. (us) Write Lat. (us) Lat. (us) CPU usr (%) CPU sys (%) CPU iowait (%)
1 7.0 0.0 7.0 1.8 0.0 1.8 552.7 0.0 552.7 0.1 0.1 24.9
2 11.9 0.0 11.9 3.0 0.0 3.0 655.5 0.0 655.5 0.1 0.2 49.7
4 11.9 0.0 11.9 3.0 0.0 3.0 1311.3 0.0 1311.3 0.1 0.2 99.7
8 11.9 0.0 11.9 3.0 0.0 3.0 2622.3 0.0 2622.3 0.1 0.3 99.6
16 11.9 0.0 11.9 3.0 0.0 3.0 5245.5 0.0 5245.5 0.1 0.5 99.3

Plots for details

Bandwidth I/O Rate
CPU Utilization Read Latency CDF

Access Pattern Sweep

This sweep evaluates different workload profiles, including sequential and random access patterns, and the mix of read and write operations. It highlights the workload sensitivity of the device.

Console summary

AccessPattern Read BW (MB/s) Write BW (MB/s) Total BW (MB/s) Read Thruput (kops) Write Thruput (kops) Thruput (kops) Read Lat. (us) Write Lat. (us) Lat. (us) CPU usr (%) CPU sys (%) CPU iowait (%)
read 11.9 0.0 11.9 3.0 0.0 3.0 1311.8 0.0 1311.8 0.1 0.1 99.8
write 0.0 11.9 11.9 0.0 3.0 3.0 0.0 1311.4 1311.4 0.0 0.4 99.4
randread 11.9 0.0 11.9 3.0 0.0 3.0 1311.3 0.0 1311.3 0.1 0.2 99.7
randwrite 0.0 11.9 11.9 0.0 3.0 3.0 0.0 1311.3 1311.3 0.0 0.4 99.6
rw 8.3 3.6 11.9 2.1 0.9 3.0 1235.1 1488.3 1311.4 0.1 0.3 99.7
randrw 8.3 3.6 11.9 2.1 0.9 3.0 1237.9 1481.8 1311.4 0.1 0.3 99.4

Plots for details

Bandwidth I/O Rate CPU Utilization
Read Latency CDF Write Latency CDF

Additional notes