Preface

Who This Book Is For

This book is written for software engineers who need to perform performance measurement and analysis. You might be:

  • A developer who needs to evaluate different algorithms or systems
  • A QA engineer responsible for performance testing
  • An embedded systems engineer evaluating hardware platforms
  • A technical lead who needs to present performance data to customers
  • A student interested in performance engineering

Regardless of your background, this book will help you avoid common benchmarking pitfalls and produce reliable, reproducible, and meaningful performance data.

Book Structure

This book is organized into nine parts:

Part I: Foundations - Benchmarking Methodology (Chapters 1-4) Establishes correct benchmarking mindset and methodology. Covers measurement environment, statistical methods, and result presentation.

Part II: Tools - Classic Benchmarks & Profiling (Chapters 5-9) Introduces various benchmark tools including CPU, memory, system-level, profiling, and embedded benchmarks.

Part III: Theory - Performance Modeling (Chapters 10-12) Deep dive into performance modeling including Roofline Model, Amdahl's Law, cache, and branch prediction.

Part IV: Practice - Data Structures & Algorithms (Chapters 13-15) Practical data structure performance analysis including array vs linked list, hash table vs tree, and sorting algorithms.

Part V: Advanced - Parallelism & Vectorization (Chapters 16-18) Advanced topics: SIMD, multi-core performance, and memory allocators.

Part VI: Embedded Constraints (Chapters 19-22) Embedded system footprint analysis: static/dynamic analysis, compiler optimization, stack analysis, and RTOS case study.

Part VII: AI/HPC Performance (Chapters 23-29) Modern performance domains: AI/ML benchmarks, HPC, GPU, LLM, ML compiler, and Edge AI.

Part VIII: Case Studies (Chapters 30-32) Real-world optimization case studies: web server, database query, and ML inference.

Part IX: Synthesis (Chapters 33-35) Bringing it all together: how to benchmark, how to optimize, and CI/CD for performance.

About Code and Commands

Code examples in this book are primarily in C, the most common language for performance measurement. Concepts apply to any language.

Commands and tools in this book focus on Linux. Linux is chosen because:

  1. Linux is the mainstream platform for servers and embedded systems
  2. Linux provides the most complete performance measurement tools
  3. Linux behavior is most predictable and controllable

Most concepts and methodologies apply to all operating systems. Appendices provide corresponding tools and commands for Windows and macOS users.

How to Use This Book

You can read sequentially or jump to chapters that interest you. However, I recommend:

  1. Read Part I first: Even if experienced, the methodology here helps avoid common mistakes
  2. Parts II-III are selectively readable: Choose relevant chapters based on what you need to measure
  3. Part IV is for data structures: When you need to understand how data structures perform in practice
  4. Part V is for low-level optimization: SIMD, multi-core, and memory allocators
  5. Part VI is for embedded: When you need to analyze footprint in resource-constrained systems
  6. Part VII is for AI/HPC: When you need to handle modern AI and HPC workloads
  7. Part VIII contains case studies: Real-world optimization examples
  8. Part IX provides synthesis: How to benchmark, optimize, and integrate into CI/CD

Each chapter ends with a Summary, and appendices contain exercises for practice.

Acknowledgments

This book exists thanks to the inspiration and support of many people.

First, I want to thank Gavin Guo and Jim Huang (jserv), two former colleagues from whom I learned a great deal—both through direct collaboration and through their publications, talks, and open-source contributions to performance analysis tools and methodology. Their work in the public domain continues to benefit engineers everywhere.

I thank the open-source community for creating the tools that make this book possible—perf, Valgrind, GCC, LLVM, and countless others. The transparency of open-source software allows us to understand performance at the deepest levels.

Thanks to engineers who share knowledge through blogs, papers, and conference talks. The work of Brendan Gregg on performance analysis, Fedor Pikus on C++ optimization, Ulrich Drepper on memory systems, and Agner Fog on x86 optimization has shaped my understanding and influenced this book.

I thank colleagues at SiFive, MIPS, Andes Technology, Broadcom, Western Digital, and SiS. Performance analysis and benchmarking has been my primary focus at SiFive and a significant part of my responsibilities at other companies. The practical experience gained from these teams—debugging real performance issues, building measurement infrastructure, and optimizing production systems—forms the foundation of the examples and case studies throughout this book.

Thanks to early reviewers who provided feedback on draft chapters. Your suggestions improved the technical accuracy and clarity of the material.

Finally, thanks to my family for their patience and support during many evenings and weekends of writing.

Feedback

If you find errors or have suggestions, please contact: djiang.tw@gmail.com


Let's begin.