Chapter 7: System-Level Benchmarks

Part II: Tools

"The best benchmark is the actual workload." — Anonymous sysadmin

The "New Server Is Slower" Ticket

"This new server has problems. It's slower than the old one."

That was the trouble ticket I received. A freshly racked server—faster CPU, more memory, newer SSD—but users complained it "felt slower."

"Felt" is a hard thing to debug.

I ran CPU benchmarks—new server was 40% faster. Memory benchmarks—30% faster. Disk I/O—3× faster. Every individual test showed the new server was better.

But users insisted: "It's just slower."

Finally I found the problem: network latency. The new server was in a different rack, adding 2ms latency to the database. For this database-heavy application, each request accessed the database dozens of times. The accumulated latency was noticeable.

This is why we need system-level benchmarks—measuring CPU, memory, and disk separately isn't enough. We need to measure how the entire system works together.

Micro-benchmarks vs System-level Benchmarks

Let's clarify the difference between these two types:

Type	Measures	Examples
Micro-benchmark	Single component	CoreMark (CPU), STREAM (memory)
System-level	Entire system	UnixBench, Sysbench, Phoronix

The problem with micro-benchmarks:

CPU score:     100 points
Memory score:  100 points
Disk score:    100 points
─────────────────────────
System score:  ???

(Not 300—might be 50)

System performance isn't a simple sum of component performance. The bottleneck could be anywhere—CPU, memory, disk, network, or even the OS kernel.

UnixBench: The Classic System Benchmark

UnixBench is one of the oldest system benchmarks, first released in 1984. Though dated, the fundamental concepts it measures remain important.

UnixBench Test Items

Dhrystone            CPU integer operations
Whetstone            CPU floating-point operations
Execl Throughput     Process creation (execl)
File Copy            Disk I/O
Pipe Throughput      IPC performance
Pipe-based Switching Context switch
Process Creation     fork() performance
Shell Scripts        Shell script execution
System Call          syscall overhead

Running UnixBench

# Download and compile
git clone https://github.com/kdlucas/byte-unixbench.git
cd byte-unixbench/UnixBench
make

# Run (single-threaded and multi-threaded)
./Run

Typical Output

========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: myserver
   OS: GNU/Linux -- 5.15.0-generic -- #1 SMP
   Machine: x86_64 (x86_64)
   CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz

------------------------------------------------------------------------
Benchmark Run: Wed Dec 18 2024 10:00:00

1 parallel copy of tests:

Dhrystone 2 using register variables    45000000.0 lps   (10.0 s, 7 samples)
Double-Precision Whetstone               8500.0 MWIPS (10.0 s, 7 samples)
Execl Throughput                         5000.0 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks  900000.0 KBps  (30.0 s, 2 samples)
...

System Benchmarks Index Values:

                                          BASELINE       RESULT    INDEX
Dhrystone 2 using register variables      116700.0   45000000.0   3855.8
Double-Precision Whetstone                    55.0       8500.0   1545.5
...
                                                       ========
System Benchmarks Index Score:                           2150.3

UnixBench Limitations

Too old — Many tests designed in the 1980-90s
Doesn't represent modern workloads — No web server, database tests
Controversial index calculation — Geometric mean may hide individual weaknesses
Single-machine only — Doesn't test network performance

Sysbench: A More Modern Choice

Sysbench is a more modern benchmark tool, particularly suited for testing database servers.

Sysbench Test Types

# CPU test
sysbench cpu --cpu-max-prime=20000 run

# Memory test
sysbench memory --memory-block-size=1K --memory-total-size=10G run

# Disk I/O test
sysbench fileio --file-total-size=10G prepare
sysbench fileio --file-total-size=10G --file-test-mode=rndrw run
sysbench fileio --file-total-size=10G cleanup

# MySQL test
sysbench oltp_read_write --mysql-host=localhost --mysql-user=root \
    --mysql-db=test --tables=10 --table-size=100000 prepare
sysbench oltp_read_write --mysql-host=localhost --mysql-user=root \
    --mysql-db=test --tables=10 --table-size=100000 --threads=16 run

Sysbench CPU Test Analysis

sysbench cpu --cpu-max-prime=20000 --threads=4 run

Sysbench Advantages

Database-ready — Built-in MySQL/PostgreSQL support
Scriptable — Can write custom Lua scripts
Modern metrics — Provides latency percentiles
Active maintenance — Continuously updated

Phoronix Test Suite: The Most Comprehensive Option

Phoronix Test Suite (PTS) is currently the most comprehensive open-source benchmark suite, containing hundreds of tests.

Installing Phoronix Test Suite

# Ubuntu/Debian
sudo apt install phoronix-test-suite

# Or download from official site
wget https://phoronix-test-suite.com/releases/phoronix-test-suite-10.8.4.tar.gz
tar xvf phoronix-test-suite-10.8.4.tar.gz
cd phoronix-test-suite
sudo ./install-sh

Common Commands

# List all available tests
phoronix-test-suite list-available-tests

# Install a test
phoronix-test-suite install pts/compress-7zip

# Run a single test
phoronix-test-suite run pts/compress-7zip

# Run a test suite
phoronix-test-suite run pts/disk

# Compare two results
phoronix-test-suite merge-results result1 result2

Common Test Suites

Suite	Contents
pts/disk	Disk I/O (fio, iozone, bonnie++)
pts/cpu	CPU (compress, encode, compile)
pts/memory	Memory bandwidth and latency
pts/network	Network throughput
pts/compilation	Kernel compile, GCC compile

Example Run

$ phoronix-test-suite run pts/compress-7zip

    Phoronix Test Suite v10.8.4

    7-Zip Compression 16.02

    Test: Compression Rating

    Processor: Intel Core i7-10700 @ 4.80GHz (8 Cores / 16 Threads)
    Memory: 32GB DDR4-3200
    OS: Ubuntu 22.04

    Compression Rating:
        58234 MIPS

    Decompression Rating:
        71823 MIPS

OpenBenchmarking.org Integration

Phoronix's unique feature is integration with OpenBenchmarking.org, where you can:

Upload results — Share to the cloud
Compare results — Compare with other users' results
Track history — Observe performance trends over time

# Upload results
phoronix-test-suite upload-result my-result

# Compare results to baseline
phoronix-test-suite compare-results-to-baseline my-result baseline-id

Choosing the Right System Benchmark

Different scenarios call for different tools:

Scenario	Recommended Tool	Reason
Quick system health check	UnixBench	Simple, fast, covers basics
Database server	Sysbench	Dedicated OLTP tests
Comprehensive analysis	Phoronix	Hundreds of tests, customizable
CI/CD automation	Sysbench + custom scripts	Scriptable, easy to integrate
Hardware purchase decisions	Phoronix + public comparisons	Lots of public data

Practical Advice

1. Define "Performance" First

Before running benchmarks, ask yourself:

For this system, what is "performance"?
Do users care about throughput or latency?
Which component is most likely the bottleneck?

2. Tests Should Simulate Real Usage

# Bad: test CPU alone
sysbench cpu run

# Better: simulate real database workload
sysbench oltp_read_write --tables=10 --table-size=1000000 \
    --threads=32 --time=300 run

3. Run Multiple Times, Report Statistics

# Run 5 times, take median
for i in {1..5}; do
    sysbench cpu run >> results.txt
done

4. Record Complete Environment

# Record environment before benchmark
echo "=== System Info ===" > env.txt
uname -a >> env.txt
cat /proc/cpuinfo | grep "model name" | head -1 >> env.txt
free -h >> env.txt
df -h >> env.txt

Back to That Ticket

Now you know why individual CPU/memory/disk benchmarks didn't find the problem. The real bottleneck was network latency, which wasn't in the tests I ran.

If I had used Sysbench's OLTP test to directly measure "the complete path from application to database," I should have found the problem.

Lesson: Choose benchmarks that represent your real workload.

Summary

System-level benchmarks measure overall system performance, not individual components:

Tool Selection

UnixBench: Classic but dated, good for quick checks
Sysbench: Modern, scriptable, good for database workloads
Phoronix: Most comprehensive, good for deep analysis

Best Practices

Define what "performance" means first
Choose tests that represent real workloads
Run multiple times, report statistics
Record complete environment information

Common Pitfalls

Testing components individually, ignoring system integration
Using tests that don't represent real workloads
Looking at only one metric, ignoring latency distribution

Performance and Benchmarking