Chapter 7: System-Level Benchmarks

Part II: Tools


"The best benchmark is the actual workload." — Anonymous sysadmin

The "New Server Is Slower" Ticket

"This new server has problems. It's slower than the old one."

That was the trouble ticket I received. A freshly racked server—faster CPU, more memory, newer SSD—but users complained it "felt slower."

"Felt" is a hard thing to debug.

I ran CPU benchmarks—new server was 40% faster. Memory benchmarks—30% faster. Disk I/O—3× faster. Every individual test showed the new server was better.

But users insisted: "It's just slower."

Finally I found the problem: network latency. The new server was in a different rack, adding 2ms latency to the database. For this database-heavy application, each request accessed the database dozens of times. The accumulated latency was noticeable.

This is why we need system-level benchmarks—measuring CPU, memory, and disk separately isn't enough. We need to measure how the entire system works together.

Micro-benchmarks vs System-level Benchmarks

Let's clarify the difference between these two types:

TypeMeasuresExamples
Micro-benchmarkSingle componentCoreMark (CPU), STREAM (memory)
System-levelEntire systemUnixBench, Sysbench, Phoronix

The problem with micro-benchmarks:

CPU score:     100 points
Memory score:  100 points
Disk score:    100 points
─────────────────────────
System score:  ???

(Not 300—might be 50)

System performance isn't a simple sum of component performance. The bottleneck could be anywhere—CPU, memory, disk, network, or even the OS kernel.

UnixBench: The Classic System Benchmark

UnixBench is one of the oldest system benchmarks, first released in 1984. Though dated, the fundamental concepts it measures remain important.

UnixBench Test Items

Dhrystone            CPU integer operations
Whetstone            CPU floating-point operations
Execl Throughput     Process creation (execl)
File Copy            Disk I/O
Pipe Throughput      IPC performance
Pipe-based Switching Context switch
Process Creation     fork() performance
Shell Scripts        Shell script execution
System Call          syscall overhead

Running UnixBench

# Download and compile
git clone https://github.com/kdlucas/byte-unixbench.git
cd byte-unixbench/UnixBench
make

# Run (single-threaded and multi-threaded)
./Run

Typical Output

========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: myserver
   OS: GNU/Linux -- 5.15.0-generic -- #1 SMP
   Machine: x86_64 (x86_64)
   CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz

------------------------------------------------------------------------
Benchmark Run: Wed Dec 18 2024 10:00:00

1 parallel copy of tests:

Dhrystone 2 using register variables    45000000.0 lps   (10.0 s, 7 samples)
Double-Precision Whetstone               8500.0 MWIPS (10.0 s, 7 samples)
Execl Throughput                         5000.0 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks  900000.0 KBps  (30.0 s, 2 samples)
...

System Benchmarks Index Values:

                                          BASELINE       RESULT    INDEX
Dhrystone 2 using register variables      116700.0   45000000.0   3855.8
Double-Precision Whetstone                    55.0       8500.0   1545.5
...
                                                       ========
System Benchmarks Index Score:                           2150.3

UnixBench Limitations

  1. Too old — Many tests designed in the 1980-90s
  2. Doesn't represent modern workloads — No web server, database tests
  3. Controversial index calculation — Geometric mean may hide individual weaknesses
  4. Single-machine only — Doesn't test network performance

Sysbench: A More Modern Choice

Sysbench is a more modern benchmark tool, particularly suited for testing database servers.

Sysbench Test Types

# CPU test
sysbench cpu --cpu-max-prime=20000 run

# Memory test
sysbench memory --memory-block-size=1K --memory-total-size=10G run

# Disk I/O test
sysbench fileio --file-total-size=10G prepare
sysbench fileio --file-total-size=10G --file-test-mode=rndrw run
sysbench fileio --file-total-size=10G cleanup

# MySQL test
sysbench oltp_read_write --mysql-host=localhost --mysql-user=root \
    --mysql-db=test --tables=10 --table-size=100000 prepare
sysbench oltp_read_write --mysql-host=localhost --mysql-user=root \
    --mysql-db=test --tables=10 --table-size=100000 --threads=16 run

Sysbench CPU Test Analysis

sysbench cpu --cpu-max-prime=20000 --threads=4 run

Sysbench Advantages

  1. Database-ready — Built-in MySQL/PostgreSQL support
  2. Scriptable — Can write custom Lua scripts
  3. Modern metrics — Provides latency percentiles
  4. Active maintenance — Continuously updated

Phoronix Test Suite: The Most Comprehensive Option

Phoronix Test Suite (PTS) is currently the most comprehensive open-source benchmark suite, containing hundreds of tests.

Installing Phoronix Test Suite

# Ubuntu/Debian
sudo apt install phoronix-test-suite

# Or download from official site
wget https://phoronix-test-suite.com/releases/phoronix-test-suite-10.8.4.tar.gz
tar xvf phoronix-test-suite-10.8.4.tar.gz
cd phoronix-test-suite
sudo ./install-sh

Common Commands

# List all available tests
phoronix-test-suite list-available-tests

# Install a test
phoronix-test-suite install pts/compress-7zip

# Run a single test
phoronix-test-suite run pts/compress-7zip

# Run a test suite
phoronix-test-suite run pts/disk

# Compare two results
phoronix-test-suite merge-results result1 result2

Common Test Suites

SuiteContents
pts/diskDisk I/O (fio, iozone, bonnie++)
pts/cpuCPU (compress, encode, compile)
pts/memoryMemory bandwidth and latency
pts/networkNetwork throughput
pts/compilationKernel compile, GCC compile

Example Run

$ phoronix-test-suite run pts/compress-7zip

    Phoronix Test Suite v10.8.4

    7-Zip Compression 16.02

    Test: Compression Rating

    Processor: Intel Core i7-10700 @ 4.80GHz (8 Cores / 16 Threads)
    Memory: 32GB DDR4-3200
    OS: Ubuntu 22.04

    Compression Rating:
        58234 MIPS

    Decompression Rating:
        71823 MIPS

OpenBenchmarking.org Integration

Phoronix's unique feature is integration with OpenBenchmarking.org, where you can:

  1. Upload results — Share to the cloud
  2. Compare results — Compare with other users' results
  3. Track history — Observe performance trends over time
# Upload results
phoronix-test-suite upload-result my-result

# Compare results to baseline
phoronix-test-suite compare-results-to-baseline my-result baseline-id

Choosing the Right System Benchmark

Different scenarios call for different tools:

ScenarioRecommended ToolReason
Quick system health checkUnixBenchSimple, fast, covers basics
Database serverSysbenchDedicated OLTP tests
Comprehensive analysisPhoronixHundreds of tests, customizable
CI/CD automationSysbench + custom scriptsScriptable, easy to integrate
Hardware purchase decisionsPhoronix + public comparisonsLots of public data

Practical Advice

1. Define "Performance" First

Before running benchmarks, ask yourself:

  • For this system, what is "performance"?
  • Do users care about throughput or latency?
  • Which component is most likely the bottleneck?

2. Tests Should Simulate Real Usage

# Bad: test CPU alone
sysbench cpu run

# Better: simulate real database workload
sysbench oltp_read_write --tables=10 --table-size=1000000 \
    --threads=32 --time=300 run

3. Run Multiple Times, Report Statistics

# Run 5 times, take median
for i in {1..5}; do
    sysbench cpu run >> results.txt
done

4. Record Complete Environment

# Record environment before benchmark
echo "=== System Info ===" > env.txt
uname -a >> env.txt
cat /proc/cpuinfo | grep "model name" | head -1 >> env.txt
free -h >> env.txt
df -h >> env.txt

Back to That Ticket

Now you know why individual CPU/memory/disk benchmarks didn't find the problem. The real bottleneck was network latency, which wasn't in the tests I ran.

If I had used Sysbench's OLTP test to directly measure "the complete path from application to database," I should have found the problem.

Lesson: Choose benchmarks that represent your real workload.

Summary

System-level benchmarks measure overall system performance, not individual components:

Tool Selection

  • UnixBench: Classic but dated, good for quick checks
  • Sysbench: Modern, scriptable, good for database workloads
  • Phoronix: Most comprehensive, good for deep analysis

Best Practices

  • Define what "performance" means first
  • Choose tests that represent real workloads
  • Run multiple times, report statistics
  • Record complete environment information

Common Pitfalls

  • Testing components individually, ignoring system integration
  • Using tests that don't represent real workloads
  • Looking at only one metric, ignoring latency distribution