Appendix D: Power and Performance
"Performance per watt is the new performance." — Intel
Power Performance Fundamentals
Why Power Matters
Why power matters:
1. Data Centers
- Electricity is major operational cost
- Cooling requirements scale with power
- Power supply limits
2. Mobile Devices
- Battery life
- Thermal design limits
- User experience
3. Embedded Systems
- Battery powered
- Fanless design
- Environmental constraints
4. Environmental
- Carbon footprint
- Energy efficiency regulations
Power Components
Processor power components:
1. Dynamic Power
P_dynamic = α × C × V² × f
α = activity factor
C = capacitance
V = voltage
f = frequency
2. Static Power (Leakage)
P_static = I_leak × V
Temperature dependent
Smaller process = more leakage
3. Total Power
P_total = P_dynamic + P_static
RAPL (Running Average Power Limit)
RAPL is Intel's power monitoring interface.
Reading RAPL
# Using powercap interface
cat /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
# Calculate power
E1=$(cat /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj)
sleep 1
E2=$(cat /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj)
echo "Power: $(( (E2 - E1) / 1000000 )) W"
RAPL Domains
RAPL power domains:
Domain Description
─────────────────────────────────────────────
Package (PKG) Entire CPU package
Core CPU cores
Uncore Non-core parts (L3 cache, etc.)
DRAM Memory controller
GPU (if present) Integrated graphics
Using perf
# Measure power with perf
sudo perf stat -e power/energy-pkg/,power/energy-cores/,power/energy-ram/ \
./benchmark
# Example output
Performance counter stats for './benchmark':
45.23 Joules power/energy-pkg/
32.15 Joules power/energy-cores/
12.34 Joules power/energy-ram/
10.002345678 seconds time elapsed
Python RAPL Reader
import time
from pathlib import Path
class RAPLReader:
def __init__(self):
self.rapl_path = Path("/sys/class/powercap/intel-rapl")
self.domains = self._find_domains()
def _find_domains(self):
domains = {}
for d in self.rapl_path.glob("intel-rapl:*"):
name = (d / "name").read_text().strip()
domains[name] = d / "energy_uj"
return domains
def read_energy(self):
"""Read energy for all domains (microjoules)"""
return {
name: int(path.read_text())
for name, path in self.domains.items()
}
def measure_power(self, duration=1.0):
"""Measure power (watts)"""
e1 = self.read_energy()
time.sleep(duration)
e2 = self.read_energy()
return {
name: (e2[name] - e1[name]) / duration / 1e6
for name in e1
}
# Usage
rapl = RAPLReader()
power = rapl.measure_power(1.0)
print(f"Package power: {power.get('package-0', 0):.2f} W")
Performance per Watt
GFLOPS/W
GFLOPS/W calculation:
Performance/Power ratio = GFLOPS / Power (W)
Example:
Performance = 100 GFLOPS
Power = 50 W
Efficiency = 100 / 50 = 2 GFLOPS/W
Measurement Method
import subprocess
import time
def measure_efficiency(benchmark_cmd, duration=10):
"""Measure performance per watt"""
rapl = RAPLReader()
# Start measurement
e1 = rapl.read_energy()
t1 = time.time()
# Run benchmark
result = subprocess.run(
benchmark_cmd,
capture_output=True,
text=True
)
# End measurement
t2 = time.time()
e2 = rapl.read_energy()
# Calculate
elapsed = t2 - t1
energy_j = (e2['package-0'] - e1['package-0']) / 1e6
power_w = energy_j / elapsed
# Parse GFLOPS from benchmark output
# (depends on specific benchmark)
gflops = parse_gflops(result.stdout)
efficiency = gflops / power_w
return {
'gflops': gflops,
'power_w': power_w,
'efficiency': efficiency
}
Thermal Throttling
What is Thermal Throttling
Thermal throttling mechanism:
When temperature exceeds threshold, processor will:
1. Reduce frequency
2. Reduce voltage
3. Skip clock cycles
Result:
- Performance decreases
- Power consumption decreases
- Temperature stabilizes
Problem:
- Benchmark results become unstable
- Cannot achieve rated performance
Temperature Monitoring
# Using sensors
sudo apt install lm-sensors
sensors
# Output example
coretemp-isa-0000
Core 0: +65.0°C (high = +100.0°C, crit = +110.0°C)
Core 1: +67.0°C (high = +100.0°C, crit = +110.0°C)
# Using /sys
cat /sys/class/thermal/thermal_zone*/temp
# 65000 (millidegrees)
Detecting Throttling
# Using turbostat
sudo turbostat --interval 1
# Output example
Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI POLL C1 C1E C6
- - 2345 45.2 3600 3600 1234 0 0 55 0 0
0 0 2400 48.5 3600 3600 456 0 0 52 0 0
# Bzy_MHz < rated frequency = possible throttling
Python Monitoring
from pathlib import Path
import time
def monitor_thermal(duration=60, interval=1):
"""Monitor temperature and frequency"""
results = []
for _ in range(int(duration / interval)):
# Read temperature
temps = []
for zone in Path("/sys/class/thermal").glob("thermal_zone*"):
temp = int((zone / "temp").read_text()) / 1000
temps.append(temp)
# Read frequency
freqs = []
for cpu in Path("/sys/devices/system/cpu").glob("cpu[0-9]*"):
freq_file = cpu / "cpufreq/scaling_cur_freq"
if freq_file.exists():
freq = int(freq_file.read_text()) / 1000 # MHz
freqs.append(freq)
results.append({
'time': time.time(),
'max_temp': max(temps),
'avg_freq': sum(freqs) / len(freqs) if freqs else 0
})
time.sleep(interval)
return results
DVFS (Dynamic Voltage and Frequency Scaling)
CPU Frequency Control
# View available frequencies
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
# View current frequency
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
# Set frequency (requires root)
echo 2400000 | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed
Governor Settings
# View available governors
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
# performance powersave ondemand conservative schedutil
# Set governor
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Recommended to use 'performance' for benchmarks
Frequency Impact on Performance
Frequency vs Performance vs Power:
Frequency Rel. Perf Rel. Power Efficiency
─────────────────────────────────────────────
2.0 GHz 1.0x 1.0x 1.0x
2.5 GHz 1.25x 1.56x 0.80x
3.0 GHz 1.50x 2.25x 0.67x
3.5 GHz 1.75x 3.06x 0.57x
Power ∝ f³ (because V also increases with f)
Efficiency decreases with frequency
GPU Power
NVIDIA GPU
# Using nvidia-smi
nvidia-smi --query-gpu=power.draw --format=csv -l 1
# Detailed info
nvidia-smi dmon -s p
# Example output
# gpu pwr gtemp mtemp
0 125W 65C 55C
Using NVML
import pynvml
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
# Read power
power = pynvml.nvmlDeviceGetPowerUsage(handle) / 1000 # mW -> W
print(f"GPU Power: {power:.1f} W")
# Read temperature
temp = pynvml.nvmlDeviceGetTemperature(handle, pynvml.NVML_TEMPERATURE_GPU)
print(f"GPU Temp: {temp}°C")
pynvml.nvmlShutdown()
Power Benchmark Best Practices
Environment Preparation
#!/bin/bash
# prepare_power_benchmark.sh
# 1. Set performance governor
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# 2. Disable turbo boost (optional, for stability)
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
# 3. Ensure adequate cooling
# (wait for temperature to stabilize)
# 4. Stop unnecessary services
sudo systemctl stop cron
sudo systemctl stop unattended-upgrades
Measurement Flow
Power measurement flow:
1. Warm-up
- Run benchmark until temperature stabilizes
- Usually takes 1-5 minutes
2. Baseline measurement
- Measure idle power
- As baseline
3. Load measurement
- Run benchmark
- Record power simultaneously
4. Multiple repeats
- At least 3-5 times
- Calculate mean and standard deviation
Summary
Key points for power performance analysis:
Measurement Tools
- RAPL: Intel CPU power
- nvidia-smi: NVIDIA GPU power
- sensors: Temperature monitoring
Key Metrics
- GFLOPS/W: Performance per watt
- Energy: Total energy consumption
- Thermal headroom: Temperature margin
Influencing Factors
- Frequency and voltage
- Thermal throttling
- Workload characteristics
Best Practices
- Fixed frequency testing
- Wait for temperature to stabilize
- Multiple measurements for statistics
- Record environmental conditions