Appendix D: Power and Performance


"Performance per watt is the new performance." — Intel

Power Performance Fundamentals

Why Power Matters

Why power matters:

1. Data Centers
   - Electricity is major operational cost
   - Cooling requirements scale with power
   - Power supply limits

2. Mobile Devices
   - Battery life
   - Thermal design limits
   - User experience

3. Embedded Systems
   - Battery powered
   - Fanless design
   - Environmental constraints

4. Environmental
   - Carbon footprint
   - Energy efficiency regulations

Power Components

Processor power components:

1. Dynamic Power
   P_dynamic = α × C × V² × f

   α = activity factor
   C = capacitance
   V = voltage
   f = frequency

2. Static Power (Leakage)
   P_static = I_leak × V

   Temperature dependent
   Smaller process = more leakage

3. Total Power
   P_total = P_dynamic + P_static

RAPL (Running Average Power Limit)

RAPL is Intel's power monitoring interface.

Reading RAPL

# Using powercap interface
cat /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj

# Calculate power
E1=$(cat /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj)
sleep 1
E2=$(cat /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj)
echo "Power: $(( (E2 - E1) / 1000000 )) W"

RAPL Domains

RAPL power domains:

Domain              Description
─────────────────────────────────────────────
Package (PKG)       Entire CPU package
Core                CPU cores
Uncore              Non-core parts (L3 cache, etc.)
DRAM                Memory controller
GPU (if present)    Integrated graphics

Using perf

# Measure power with perf
sudo perf stat -e power/energy-pkg/,power/energy-cores/,power/energy-ram/ \
    ./benchmark

# Example output
Performance counter stats for './benchmark':
    45.23 Joules  power/energy-pkg/
    32.15 Joules  power/energy-cores/
    12.34 Joules  power/energy-ram/

    10.002345678 seconds time elapsed

Python RAPL Reader

import time
from pathlib import Path

class RAPLReader:
    def __init__(self):
        self.rapl_path = Path("/sys/class/powercap/intel-rapl")
        self.domains = self._find_domains()

    def _find_domains(self):
        domains = {}
        for d in self.rapl_path.glob("intel-rapl:*"):
            name = (d / "name").read_text().strip()
            domains[name] = d / "energy_uj"
        return domains

    def read_energy(self):
        """Read energy for all domains (microjoules)"""
        return {
            name: int(path.read_text())
            for name, path in self.domains.items()
        }

    def measure_power(self, duration=1.0):
        """Measure power (watts)"""
        e1 = self.read_energy()
        time.sleep(duration)
        e2 = self.read_energy()

        return {
            name: (e2[name] - e1[name]) / duration / 1e6
            for name in e1
        }

# Usage
rapl = RAPLReader()
power = rapl.measure_power(1.0)
print(f"Package power: {power.get('package-0', 0):.2f} W")

Performance per Watt

GFLOPS/W

GFLOPS/W calculation:

Performance/Power ratio = GFLOPS / Power (W)

Example:
  Performance = 100 GFLOPS
  Power = 50 W
  Efficiency = 100 / 50 = 2 GFLOPS/W

Measurement Method

import subprocess
import time

def measure_efficiency(benchmark_cmd, duration=10):
    """Measure performance per watt"""
    rapl = RAPLReader()

    # Start measurement
    e1 = rapl.read_energy()
    t1 = time.time()

    # Run benchmark
    result = subprocess.run(
        benchmark_cmd,
        capture_output=True,
        text=True
    )

    # End measurement
    t2 = time.time()
    e2 = rapl.read_energy()

    # Calculate
    elapsed = t2 - t1
    energy_j = (e2['package-0'] - e1['package-0']) / 1e6
    power_w = energy_j / elapsed

    # Parse GFLOPS from benchmark output
    # (depends on specific benchmark)
    gflops = parse_gflops(result.stdout)

    efficiency = gflops / power_w

    return {
        'gflops': gflops,
        'power_w': power_w,
        'efficiency': efficiency
    }

Thermal Throttling

What is Thermal Throttling

Thermal throttling mechanism:

When temperature exceeds threshold, processor will:
1. Reduce frequency
2. Reduce voltage
3. Skip clock cycles

Result:
  - Performance decreases
  - Power consumption decreases
  - Temperature stabilizes

Problem:
  - Benchmark results become unstable
  - Cannot achieve rated performance

Temperature Monitoring

# Using sensors
sudo apt install lm-sensors
sensors

# Output example
coretemp-isa-0000
Core 0:       +65.0°C  (high = +100.0°C, crit = +110.0°C)
Core 1:       +67.0°C  (high = +100.0°C, crit = +110.0°C)

# Using /sys
cat /sys/class/thermal/thermal_zone*/temp
# 65000 (millidegrees)

Detecting Throttling

# Using turbostat
sudo turbostat --interval 1

# Output example
Core  CPU  Avg_MHz  Busy%  Bzy_MHz  TSC_MHz  IRQ  SMI  POLL  C1  C1E  C6
-     -    2345     45.2   3600     3600     1234 0    0     55  0   0
0     0    2400     48.5   3600     3600     456  0    0     52  0   0

# Bzy_MHz < rated frequency = possible throttling

Python Monitoring

from pathlib import Path
import time

def monitor_thermal(duration=60, interval=1):
    """Monitor temperature and frequency"""
    results = []

    for _ in range(int(duration / interval)):
        # Read temperature
        temps = []
        for zone in Path("/sys/class/thermal").glob("thermal_zone*"):
            temp = int((zone / "temp").read_text()) / 1000
            temps.append(temp)

        # Read frequency
        freqs = []
        for cpu in Path("/sys/devices/system/cpu").glob("cpu[0-9]*"):
            freq_file = cpu / "cpufreq/scaling_cur_freq"
            if freq_file.exists():
                freq = int(freq_file.read_text()) / 1000  # MHz
                freqs.append(freq)

        results.append({
            'time': time.time(),
            'max_temp': max(temps),
            'avg_freq': sum(freqs) / len(freqs) if freqs else 0
        })

        time.sleep(interval)

    return results

DVFS (Dynamic Voltage and Frequency Scaling)

CPU Frequency Control

# View available frequencies
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies

# View current frequency
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq

# Set frequency (requires root)
echo 2400000 | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed

Governor Settings

# View available governors
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
# performance powersave ondemand conservative schedutil

# Set governor
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# Recommended to use 'performance' for benchmarks

Frequency Impact on Performance

Frequency vs Performance vs Power:

Frequency   Rel. Perf   Rel. Power  Efficiency
─────────────────────────────────────────────
2.0 GHz     1.0x        1.0x        1.0x
2.5 GHz     1.25x       1.56x       0.80x
3.0 GHz     1.50x       2.25x       0.67x
3.5 GHz     1.75x       3.06x       0.57x

Power ∝ f³ (because V also increases with f)
Efficiency decreases with frequency

GPU Power

NVIDIA GPU

# Using nvidia-smi
nvidia-smi --query-gpu=power.draw --format=csv -l 1

# Detailed info
nvidia-smi dmon -s p

# Example output
# gpu   pwr  gtemp  mtemp
    0   125W    65C    55C

Using NVML

import pynvml

pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)

# Read power
power = pynvml.nvmlDeviceGetPowerUsage(handle) / 1000  # mW -> W
print(f"GPU Power: {power:.1f} W")

# Read temperature
temp = pynvml.nvmlDeviceGetTemperature(handle, pynvml.NVML_TEMPERATURE_GPU)
print(f"GPU Temp: {temp}°C")

pynvml.nvmlShutdown()

Power Benchmark Best Practices

Environment Preparation

#!/bin/bash
# prepare_power_benchmark.sh

# 1. Set performance governor
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# 2. Disable turbo boost (optional, for stability)
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo

# 3. Ensure adequate cooling
# (wait for temperature to stabilize)

# 4. Stop unnecessary services
sudo systemctl stop cron
sudo systemctl stop unattended-upgrades

Measurement Flow

Power measurement flow:

1. Warm-up
   - Run benchmark until temperature stabilizes
   - Usually takes 1-5 minutes

2. Baseline measurement
   - Measure idle power
   - As baseline

3. Load measurement
   - Run benchmark
   - Record power simultaneously

4. Multiple repeats
   - At least 3-5 times
   - Calculate mean and standard deviation

Summary

Key points for power performance analysis:

Measurement Tools

  • RAPL: Intel CPU power
  • nvidia-smi: NVIDIA GPU power
  • sensors: Temperature monitoring

Key Metrics

  • GFLOPS/W: Performance per watt
  • Energy: Total energy consumption
  • Thermal headroom: Temperature margin

Influencing Factors

  • Frequency and voltage
  • Thermal throttling
  • Workload characteristics

Best Practices

  • Fixed frequency testing
  • Wait for temperature to stabilize
  • Multiple measurements for statistics
  • Record environmental conditions