Chapter 7: System-Level Benchmarks
Part II: Tools
"The best benchmark is the actual workload." — Anonymous sysadmin
The "New Server Is Slower" Ticket
"This new server has problems. It's slower than the old one."
That was the trouble ticket I received. A freshly racked server—faster CPU, more memory, newer SSD—but users complained it "felt slower."
"Felt" is a hard thing to debug.
I ran CPU benchmarks—new server was 40% faster. Memory benchmarks—30% faster. Disk I/O—3× faster. Every individual test showed the new server was better.
But users insisted: "It's just slower."
Finally I found the problem: network latency. The new server was in a different rack, adding 2ms latency to the database. For this database-heavy application, each request accessed the database dozens of times. The accumulated latency was noticeable.
This is why we need system-level benchmarks—measuring CPU, memory, and disk separately isn't enough. We need to measure how the entire system works together.
Micro-benchmarks vs System-level Benchmarks
Let's clarify the difference between these two types:
| Type | Measures | Examples |
|---|---|---|
| Micro-benchmark | Single component | CoreMark (CPU), STREAM (memory) |
| System-level | Entire system | UnixBench, Sysbench, Phoronix |
The problem with micro-benchmarks:
CPU score: 100 points
Memory score: 100 points
Disk score: 100 points
─────────────────────────
System score: ???
(Not 300—might be 50)
System performance isn't a simple sum of component performance. The bottleneck could be anywhere—CPU, memory, disk, network, or even the OS kernel.
UnixBench: The Classic System Benchmark
UnixBench is one of the oldest system benchmarks, first released in 1984. Though dated, the fundamental concepts it measures remain important.
UnixBench Test Items
Dhrystone CPU integer operations
Whetstone CPU floating-point operations
Execl Throughput Process creation (execl)
File Copy Disk I/O
Pipe Throughput IPC performance
Pipe-based Switching Context switch
Process Creation fork() performance
Shell Scripts Shell script execution
System Call syscall overhead
Running UnixBench
# Download and compile
git clone https://github.com/kdlucas/byte-unixbench.git
cd byte-unixbench/UnixBench
make
# Run (single-threaded and multi-threaded)
./Run
Typical Output
========================================================================
BYTE UNIX Benchmarks (Version 5.1.3)
System: myserver
OS: GNU/Linux -- 5.15.0-generic -- #1 SMP
Machine: x86_64 (x86_64)
CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
------------------------------------------------------------------------
Benchmark Run: Wed Dec 18 2024 10:00:00
1 parallel copy of tests:
Dhrystone 2 using register variables 45000000.0 lps (10.0 s, 7 samples)
Double-Precision Whetstone 8500.0 MWIPS (10.0 s, 7 samples)
Execl Throughput 5000.0 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 900000.0 KBps (30.0 s, 2 samples)
...
System Benchmarks Index Values:
BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 45000000.0 3855.8
Double-Precision Whetstone 55.0 8500.0 1545.5
...
========
System Benchmarks Index Score: 2150.3
UnixBench Limitations
- Too old — Many tests designed in the 1980-90s
- Doesn't represent modern workloads — No web server, database tests
- Controversial index calculation — Geometric mean may hide individual weaknesses
- Single-machine only — Doesn't test network performance
Sysbench: A More Modern Choice
Sysbench is a more modern benchmark tool, particularly suited for testing database servers.
Sysbench Test Types
# CPU test
sysbench cpu --cpu-max-prime=20000 run
# Memory test
sysbench memory --memory-block-size=1K --memory-total-size=10G run
# Disk I/O test
sysbench fileio --file-total-size=10G prepare
sysbench fileio --file-total-size=10G --file-test-mode=rndrw run
sysbench fileio --file-total-size=10G cleanup
# MySQL test
sysbench oltp_read_write --mysql-host=localhost --mysql-user=root \
--mysql-db=test --tables=10 --table-size=100000 prepare
sysbench oltp_read_write --mysql-host=localhost --mysql-user=root \
--mysql-db=test --tables=10 --table-size=100000 --threads=16 run
Sysbench CPU Test Analysis
sysbench cpu --cpu-max-prime=20000 --threads=4 run
Sysbench Advantages
- Database-ready — Built-in MySQL/PostgreSQL support
- Scriptable — Can write custom Lua scripts
- Modern metrics — Provides latency percentiles
- Active maintenance — Continuously updated
Phoronix Test Suite: The Most Comprehensive Option
Phoronix Test Suite (PTS) is currently the most comprehensive open-source benchmark suite, containing hundreds of tests.
Installing Phoronix Test Suite
# Ubuntu/Debian
sudo apt install phoronix-test-suite
# Or download from official site
wget https://phoronix-test-suite.com/releases/phoronix-test-suite-10.8.4.tar.gz
tar xvf phoronix-test-suite-10.8.4.tar.gz
cd phoronix-test-suite
sudo ./install-sh
Common Commands
# List all available tests
phoronix-test-suite list-available-tests
# Install a test
phoronix-test-suite install pts/compress-7zip
# Run a single test
phoronix-test-suite run pts/compress-7zip
# Run a test suite
phoronix-test-suite run pts/disk
# Compare two results
phoronix-test-suite merge-results result1 result2
Common Test Suites
| Suite | Contents |
|---|---|
| pts/disk | Disk I/O (fio, iozone, bonnie++) |
| pts/cpu | CPU (compress, encode, compile) |
| pts/memory | Memory bandwidth and latency |
| pts/network | Network throughput |
| pts/compilation | Kernel compile, GCC compile |
Example Run
$ phoronix-test-suite run pts/compress-7zip
Phoronix Test Suite v10.8.4
7-Zip Compression 16.02
Test: Compression Rating
Processor: Intel Core i7-10700 @ 4.80GHz (8 Cores / 16 Threads)
Memory: 32GB DDR4-3200
OS: Ubuntu 22.04
Compression Rating:
58234 MIPS
Decompression Rating:
71823 MIPS
OpenBenchmarking.org Integration
Phoronix's unique feature is integration with OpenBenchmarking.org, where you can:
- Upload results — Share to the cloud
- Compare results — Compare with other users' results
- Track history — Observe performance trends over time
# Upload results
phoronix-test-suite upload-result my-result
# Compare results to baseline
phoronix-test-suite compare-results-to-baseline my-result baseline-id
Choosing the Right System Benchmark
Different scenarios call for different tools:
| Scenario | Recommended Tool | Reason |
|---|---|---|
| Quick system health check | UnixBench | Simple, fast, covers basics |
| Database server | Sysbench | Dedicated OLTP tests |
| Comprehensive analysis | Phoronix | Hundreds of tests, customizable |
| CI/CD automation | Sysbench + custom scripts | Scriptable, easy to integrate |
| Hardware purchase decisions | Phoronix + public comparisons | Lots of public data |
Practical Advice
1. Define "Performance" First
Before running benchmarks, ask yourself:
- For this system, what is "performance"?
- Do users care about throughput or latency?
- Which component is most likely the bottleneck?
2. Tests Should Simulate Real Usage
# Bad: test CPU alone
sysbench cpu run
# Better: simulate real database workload
sysbench oltp_read_write --tables=10 --table-size=1000000 \
--threads=32 --time=300 run
3. Run Multiple Times, Report Statistics
# Run 5 times, take median
for i in {1..5}; do
sysbench cpu run >> results.txt
done
4. Record Complete Environment
# Record environment before benchmark
echo "=== System Info ===" > env.txt
uname -a >> env.txt
cat /proc/cpuinfo | grep "model name" | head -1 >> env.txt
free -h >> env.txt
df -h >> env.txt
Back to That Ticket
Now you know why individual CPU/memory/disk benchmarks didn't find the problem. The real bottleneck was network latency, which wasn't in the tests I ran.
If I had used Sysbench's OLTP test to directly measure "the complete path from application to database," I should have found the problem.
Lesson: Choose benchmarks that represent your real workload.
Summary
System-level benchmarks measure overall system performance, not individual components:
Tool Selection
- UnixBench: Classic but dated, good for quick checks
- Sysbench: Modern, scriptable, good for database workloads
- Phoronix: Most comprehensive, good for deep analysis
Best Practices
- Define what "performance" means first
- Choose tests that represent real workloads
- Run multiple times, report statistics
- Record complete environment information
Common Pitfalls
- Testing components individually, ignoring system integration
- Using tests that don't represent real workloads
- Looking at only one metric, ignoring latency distribution