Appendix A: Benchmark Automation
"If you can't automate it, you can't scale it." — DevOps Proverb
Why Automation Is Needed
Manual benchmark execution has several problems:
Problems:
1. Human error (forgetting to set environment, wrong parameters)
2. Non-reproducible (different conditions each run)
3. Time-consuming (requires manual waiting and recording)
4. Hard to track (results scattered everywhere)
Solution:
Automate benchmark workflow
Integrate into CI/CD pipeline
Automatically detect performance regressions
CI/CD Integration
GitHub Actions Example
# .github/workflows/benchmark.yml
name: Performance Benchmark
on:
push:
branches: [main]
pull_request:
branches: [main]
schedule:
- cron: '0 2 * * *' # Daily at 2 AM
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup environment
run: |
# Lock CPU frequency (if possible)
sudo cpupower frequency-set -g performance || true
# Disable turbo boost
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo || true
- name: Build
run: |
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j$(nproc)
- name: Run benchmarks
run: |
cd build
./benchmark --benchmark_format=json > benchmark_results.json
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: benchmark-results
path: build/benchmark_results.json
- name: Compare with baseline
run: |
python scripts/compare_benchmarks.py \
--baseline baseline.json \
--current build/benchmark_results.json \
--threshold 5
Dedicated Benchmark Runner
For serious performance testing, use a dedicated machine:
# Using self-hosted runner
jobs:
benchmark:
runs-on: [self-hosted, benchmark-machine]
steps:
- name: Ensure isolation
run: |
# Ensure no other programs running
sudo systemctl stop cron
sudo systemctl stop unattended-upgrades
# Set CPU affinity
taskset -c 0-3 ./benchmark
Google Benchmark Integration
Basic Setup
#include <benchmark/benchmark.h>
static void BM_VectorPushBack(benchmark::State& state) {
for (auto _ : state) {
std::vector<int> v;
for (int i = 0; i < state.range(0); ++i) {
v.push_back(i);
}
}
state.SetComplexityN(state.range(0));
}
BENCHMARK(BM_VectorPushBack)
->Range(8, 8<<10)
->Complexity(benchmark::oN);
BENCHMARK_MAIN();
JSON Output
# Output JSON format
./benchmark --benchmark_format=json --benchmark_out=results.json
# Example output
{
"context": {
"date": "2024-01-15T10:30:00+08:00",
"host_name": "benchmark-server",
"executable": "./benchmark",
"num_cpus": 8,
"mhz_per_cpu": 3600,
"cpu_scaling_enabled": false
},
"benchmarks": [
{
"name": "BM_VectorPushBack/8",
"real_time": 45.2,
"cpu_time": 44.8,
"iterations": 15234567
}
]
}
Comparison Tool
# Use Google Benchmark's comparison tool
pip install google-benchmark
# Compare two runs
compare.py benchmarks baseline.json current.json
# Output
Benchmark Time CPU
--------------------------------------------
BM_VectorPushBack/8 -0.05 -0.04
BM_VectorPushBack/64 +0.12 +0.11 # Warning: regression
BM_VectorPushBack/512 -0.02 -0.03
Regression Detection
Statistical Method
import numpy as np
from scipy import stats
def detect_regression(baseline, current, threshold=0.05, alpha=0.05):
"""
Use statistical test to detect performance regression
Args:
baseline: List of baseline measurements
current: List of current measurements
threshold: Acceptable performance change ratio
alpha: Significance level
Returns:
(is_regression, p_value, change_percent)
"""
# Calculate percent change
baseline_mean = np.mean(baseline)
current_mean = np.mean(current)
change_percent = (current_mean - baseline_mean) / baseline_mean * 100
# Perform t-test
t_stat, p_value = stats.ttest_ind(baseline, current)
# Determine if significant regression
is_regression = (
p_value < alpha and
change_percent > threshold * 100
)
return is_regression, p_value, change_percent
# Usage example
baseline = [100.2, 101.5, 99.8, 100.1, 100.9]
current = [108.3, 109.1, 107.5, 108.8, 109.2]
is_reg, p, change = detect_regression(baseline, current)
print(f"Regression: {is_reg}, p-value: {p:.4f}, change: {change:.1f}%")
# Regression: True, p-value: 0.0001, change: 8.2%
Automation Script
#!/usr/bin/env python3
"""benchmark_compare.py - Compare benchmark results and detect regressions"""
import json
import sys
from pathlib import Path
def load_results(filepath):
"""Load Google Benchmark JSON results"""
with open(filepath) as f:
data = json.load(f)
return {b['name']: b for b in data['benchmarks']}
def compare_results(baseline_path, current_path, threshold=5.0):
"""Compare two benchmark results"""
baseline = load_results(baseline_path)
current = load_results(current_path)
regressions = []
improvements = []
for name, curr in current.items():
if name not in baseline:
continue
base = baseline[name]
change = (curr['real_time'] - base['real_time']) / base['real_time'] * 100
if change > threshold:
regressions.append((name, change))
elif change < -threshold:
improvements.append((name, change))
return regressions, improvements
def main():
if len(sys.argv) != 4:
print("Usage: benchmark_compare.py <baseline> <current> <threshold>")
sys.exit(1)
baseline_path = sys.argv[1]
current_path = sys.argv[2]
threshold = float(sys.argv[3])
regressions, improvements = compare_results(
baseline_path, current_path, threshold
)
if regressions:
print("❌ Performance Regressions Detected:")
for name, change in regressions:
print(f" {name}: +{change:.1f}%")
sys.exit(1)
if improvements:
print("✅ Performance Improvements:")
for name, change in improvements:
print(f" {name}: {change:.1f}%")
print("✅ No regressions detected")
sys.exit(0)
if __name__ == "__main__":
main()
Result Storage and Tracking
Database Storage
import sqlite3
from datetime import datetime
def init_db(db_path):
"""Initialize benchmark database"""
conn = sqlite3.connect(db_path)
conn.execute('''
CREATE TABLE IF NOT EXISTS benchmarks (
id INTEGER PRIMARY KEY,
timestamp TEXT,
commit_hash TEXT,
benchmark_name TEXT,
real_time REAL,
cpu_time REAL,
iterations INTEGER,
UNIQUE(commit_hash, benchmark_name)
)
''')
conn.commit()
return conn
def store_results(conn, commit_hash, results):
"""Store benchmark results"""
timestamp = datetime.now().isoformat()
for benchmark in results['benchmarks']:
conn.execute('''
INSERT OR REPLACE INTO benchmarks
(timestamp, commit_hash, benchmark_name, real_time, cpu_time, iterations)
VALUES (?, ?, ?, ?, ?, ?)
''', (
timestamp,
commit_hash,
benchmark['name'],
benchmark['real_time'],
benchmark['cpu_time'],
benchmark['iterations']
))
conn.commit()
def get_history(conn, benchmark_name, limit=100):
"""Get benchmark history"""
cursor = conn.execute('''
SELECT timestamp, commit_hash, real_time
FROM benchmarks
WHERE benchmark_name = ?
ORDER BY timestamp DESC
LIMIT ?
''', (benchmark_name, limit))
return cursor.fetchall()
Visualization
import matplotlib.pyplot as plt
import pandas as pd
def plot_benchmark_history(history, benchmark_name):
"""Plot benchmark history trend"""
df = pd.DataFrame(history, columns=['timestamp', 'commit', 'time'])
df['timestamp'] = pd.to_datetime(df['timestamp'])
plt.figure(figsize=(12, 6))
plt.plot(df['timestamp'], df['time'], marker='o')
plt.xlabel('Date')
plt.ylabel('Time (ns)')
plt.title(f'Benchmark History: {benchmark_name}')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig(f'{benchmark_name}_history.png')
Advanced Techniques
Environment Consistency Check
#!/bin/bash
# check_environment.sh - Check benchmark environment
echo "=== Environment Check ==="
# CPU frequency
echo "CPU Frequency:"
cat /proc/cpuinfo | grep "MHz" | head -1
# CPU Governor
echo "CPU Governor:"
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# Turbo Boost
echo "Turbo Boost:"
cat /sys/devices/system/cpu/intel_pstate/no_turbo 2>/dev/null || echo "N/A"
# System load
echo "System Load:"
uptime
# Memory
echo "Memory:"
free -h
# Background processes
echo "Background Processes:"
ps aux | wc -l
Multiple Runs with Statistics
# Multiple runs in CI
- name: Run benchmarks (multiple iterations)
run: |
for i in {1..5}; do
./benchmark --benchmark_format=json > results_$i.json
done
# Merge results
python scripts/merge_results.py results_*.json > final_results.json
Performance Budget
# performance_budget.yaml
benchmarks:
BM_VectorPushBack/1024:
max_time_ns: 50000
max_memory_kb: 100
BM_HashTableLookup:
max_time_ns: 100
BM_SortArray/10000:
max_time_ns: 1000000
def check_budget(results, budget):
"""Check if performance exceeds budget"""
violations = []
for name, limits in budget['benchmarks'].items():
if name not in results:
continue
result = results[name]
if 'max_time_ns' in limits:
if result['real_time'] > limits['max_time_ns']:
violations.append(
f"{name}: {result['real_time']:.0f}ns > {limits['max_time_ns']}ns"
)
return violations
Summary
Key elements of benchmark automation:
CI/CD Integration
- GitHub Actions / GitLab CI
- Dedicated benchmark runner
- Environment consistency
Automatic Regression Detection
- Statistical testing
- Threshold configuration
- Automatic alerts
Result Management
- Database storage
- Historical tracking
- Visualization
Best Practices
- Multiple runs for statistics
- Fixed environment settings
- Performance budgets
- Automated reporting