Appendix G: Further Reading

"Standing on the shoulders of giants."

This appendix collects books, papers, and online resources that shaped how this book thinks about performance engineering and benchmarking. Treat it as a map: dip in when a topic from the main chapters sparks your curiosity.

Editor's note: If you're in the middle of a real performance incident, start with Systems Performance, the Roofline paper, and Drepper's memory article. Come back to the rest when things are calm.

Reading Guide (Inside This Book)

Reading Paths by Role

Different readers can take different paths through the main chapters. These are suggested starting points rather than strict sequences.

Reader type	Goal	Suggested chapters (main text)
System / embedded engineer	Understand system bottlenecks	Ch 1–4, 5–8, 9, 16–18, 19–22, 30, 33–35
ML / AI engineer	Focus on AI/ML and LLM performance	Ch 1–4, 5, 8, 19, 20, 23–27, 30, 32–35
HPC / perf researcher	Connect theory, hardware, and models	Ch 1–4, 5–7, 10–12, 16–18, 23–27, 30–32, 33–35

Within each path, you can always jump to appendices for hands-on exercises and environment setup when you are ready to run real benchmarks.

Topic Map (Concept → Chapters)

Use this as a quick index when you want to revisit a concept from the main text.

Benchmarking methodology and statistics: Ch 1–4, 10
Profiling tools and observability: Ch 5–8, 30–32
Cache, memory, and locality: Ch 2, 6, 12–15, 18, Appendix C, Appendix E
Data structures and algorithms in practice: Ch 13–15, 30, 31
Parallelism and multi-core scaling: Ch 16–18, 23, 30–32
Embedded and footprint constraints: Ch 9, 19–22, Appendix B, Appendix E
AI/ML and LLM performance: Ch 20, 23–27, 29, 32
End-to-end practice (how to benchmark / optimize / ship): Ch 33–35, Appendix A

When the structure evolves in future versions, this topic map is the single place that should be updated.

Books

Systems Background

Computer Systems: A Programmer's Perspective (3rd Edition) - Randal E. Bryant and David R. O'Hallaron, Pearson, 2015. A comprehensive introduction to how modern computer systems work, useful background for understanding performance bottlenecks across hardware and software.

Performance Engineering

Systems Performance: Enterprise and the Cloud (2nd Edition) - Brendan Gregg, Addison-Wesley, 2020. A broad, practical reference for performance methodology, Linux observability tools, and real production case studies.

Key chapters:

Chapter 2: Methodologies
Chapter 6: CPUs
Chapter 7: Memory
Chapter 13: perf

BPF Performance Tools - Brendan Gregg, Addison-Wesley, 2019. A modern guide to Linux observability with eBPF, useful once basic tools like perf feel natural.

Key chapters:

Chapter 4: BCC
Chapter 5: bpftrace
Chapters 6-15: Subsystem analysis

The Art of Writing Efficient Programs - Fedor G. Pikus, Packt, 2021. Focuses on high-performance C++ and shows how algorithms interact with modern CPUs and memory systems.

Key chapters:

Chapter 2: Performance Measurements
Chapter 3: CPU Architecture
Chapter 4: Memory Architecture
Chapter 9: High-Performance C++

Computer Architecture

Computer Architecture: A Quantitative Approach (6th Edition) - John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2017. The classic reference for processors, memory hierarchies, and quantitative evaluation.

Key chapters:

Chapter 1: Fundamentals
Chapter 2: Memory Hierarchy
Appendix A: Instruction Set Principles

Modern Processor Design - John Paul Shen and Mikko H. Lipasti, Waveland Press, 2013. A deeper treatment of superscalar and out-of-order processors that explains many microarchitectural effects seen in benchmarks.

Benchmarking

Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software - Connie U. Smith and Lloyd G. Williams, Addison-Wesley, 2001. A foundational text on software performance engineering and workload design.

Every Computer Performance Book - Bob Wescott, 2013. A short, very practical book full of rules of thumb for real-world performance work.

Papers

Benchmarking Methodology

How Not to Measure Computer System Performance - David J. Lilja, IEEE Computer, 2005. A concise overview of common benchmarking mistakes.

Producing Wrong Data Without Doing Anything Obviously Wrong! - Todd Mytkowicz et al., ASPLOS 2009. Shows how environment size, link order, and other details can silently corrupt results.

Key findings:

UNIX environment size affects performance
Link order matters
Measurement bias is pervasive

Rigorous Benchmarking in Reasonable Time - Tomas Kalibera and Richard Jones, ISMM 2013. Explains how to design statistically sound experiments without burning weeks of CPU time.

Stabilizer: Statistically Sound Performance Evaluation - Charlie Curtsinger and Emery D. Berger, ASPLOS 2013. Uses randomization to make performance measurements more robust and statistically sound.

ARM Performance Analysis Guides - https://developer.arm.com/documentation/. Official documentation and tuning guides for ARM CPUs.

Memory & Cache

What Every Programmer Should Know About Memory - Ulrich Drepper, https://people.freebsd.org/~lstewart/articles/cpumemory.pdf. A long but rewarding deep dive into modern memory hierarchies.

Gallery of Processor Cache Effects - Igor Ostrovsky, http://igoro.com/archive/gallery-of-processor-cache-effects/. An interactive tour of cache behavior.

Benchmarking Tools

SPEC CPU 2017 - https://www.spec.org/cpu2017/. The industry-standard CPU benchmark suite used in academia and industry.

Phoronix Test Suite - https://www.phoronix-test-suite.com/. A large collection of open-source benchmarks for Linux and other platforms.

Google Benchmark - https://github.com/google/benchmark. A C++ microbenchmarking framework that pairs well with the microbenchmark patterns in this book.

Courses

MIT 6.172: Performance Engineering of Software Systems https://ocw.mit.edu/courses/6-172-performance-engineering-of-software-systems-fall-2018/

An MIT course on performance engineering. Covers profiling, cache optimization, parallelism, and systematic performance methodology.

Berkeley CS267: Applications of Parallel Computers https://sites.google.com/lbl.gov/cs267-spr2024

An advanced course on parallel computing and high-performance computing (HPC).

CMU 15-418/618: Parallel Computer Architecture and Programming http://www.cs.cmu.edu/~418/

Another classic course on parallel programming and computer architecture.

Blogs

Brendan Gregg's Blog https://www.brendangregg.com/

Deep-dive articles on performance analysis and observability. Especially recommended:

"Linux Performance" (overview)
"Flame Graphs"
"CPU Flame Graphs"

Mechanical Sympathy https://mechanical-sympathy.blogspot.com/

Discussions of hardware-aware programming and the interaction between code and modern CPUs.

Daniel Lemire's Blog https://lemire.me/blog/

Regular posts on data-oriented design, SIMD optimization, and fast software techniques.

Travis Downs' Blog https://travisdowns.github.io/

Low-level CPU performance analysis, microbenchmarks, and deep dives into instruction behavior.

Tools

Profiling

Tool	Platform	Description
perf	Linux	Built-in Linux profiler
VTune	x86	Intel's advanced profiler
Instruments	macOS	Apple's profiling suite
Tracy	Cross	Real-time profiler popular in game development

Benchmarking

Tool	Language	Description
Google Benchmark	C++	Microbenchmark library
Criterion	Rust	Rust benchmark library
pytest-benchmark	Python	Python benchmark plugin
JMH	Java	Java microbenchmark harness

Visualization

Tool	Description
FlameGraph	Stack trace and sample visualization
Perfetto	Chrome trace-style viewer for traces
Hotspot	GUI for visualizing `perf` data

Reader type	Core book	Key paper / resource	Course
System / embedded engineer	Systems Performance	Drepper, "What Every Programmer Should Know About Memory"	MIT 6.172
ML / AI engineer	Systems Performance	MLPerf papers; "Measuring the Algorithmic Efficiency of Neural Networks"	CS267 (selected lectures)
HPC / performance researcher	Computer Architecture: A Quantitative Approach	Roofline and Cache-Aware Roofline papers	CS267 or 15-418/618

Performance and Benchmarking

Appendix G: Further Reading

Reading Guide (Inside This Book)

Reading Paths by Role

Topic Map (Concept → Chapters)

Books

Systems Background

Performance Engineering

Computer Architecture

Benchmarking

Papers

Benchmarking Methodology

Roofline Model

AI/ML Benchmarks

Online Resources

Optimization Manuals

Memory & Cache

Benchmarking Tools

Courses

Blogs

Tools

Profiling

Benchmarking

Visualization

Suggested Reading Paths

System / Embedded Engineers

ML / AI Engineers

HPC / Research-Oriented Readers