Computer, Enhance!

Share this post

Table of Contents

www.computerenhance.com

Discover more from Computer, Enhance!

Programming courses, interviews, commentary.
Over 19,000 subscribers
Continue reading
Sign in
Programming Courses

Table of Contents

Every entry in every series, listed for quick navigation.

Casey Muratori
Jan 27, 2023
625
Share this post

Table of Contents

www.computerenhance.com
70
Share

Performance-Aware Programming Series

This series is designed for programmers who know how to write programs, but don’t know how hardware runs those programs. It’s designed to bring you up to speed on how modern CPUs work, how to estimate the expected speed of performance-critical code, and the basic optimization techniques every programmer should know.

The course is broken into parts, with the first part (the “prologue”) being strictly a demonstration with no associated homework. Later parts feature weekly homework.

Q&A session videos are posted every Monday. If you have a question you’d like answered, please put it in the comments of the most recent Q&A video. Homework listings are available from github.

Prologue: The Five Multipliers (3 1/2 hours, no homework)

This part of the course gives simple demonstrations of how seemingly minor code changes can produce dramatically different software performance, even for very simple operations.

  1. Welcome to the Performance-Aware Programming Series! (22:05)

  2. Waste (32:56)

  3. Instructions Per Clock (25:05)

  4. Single Instruction, Multiple Data (35:31)

  5. Caching (22:55)

  6. Multithreading (32:11)

  7. Python Revisited (36:22)

Interlude (1 hour, no homework)

  1. The Haversine Distance Problem (30:28)

  2. “Clean” Code, Horrible Performance (22:40)

Part 1: Reading ASM (7 hours, plus homework)

This part of the course is designed to ensure that everyone taking the course has a solid understanding of how a CPU works at the assembly-language level.

  1. Instruction Decoding on the 8086 (28:28)

  2. Decoding Multiple Instructions and Suffixes (43:51)

  3. Opcode Patterns in 8086 Arithmetic (20:01)

  4. 8086 Decoder Code Review (1:17:49)

  5. Using the Reference Decoder as a Shared Library (8:48)

  6. Simulating Non-memory MOVs (18:00)

  7. Simulating ADD, SUB, and CMP (25:56)

  8. Simulating Conditional Jumps (19:41)

  9. Simulating Memory (26:32)

  10. Simulating Real Programs (16:02)

  11. Other Common Instructions (19:43)

  12. The Stack (26:58)

  13. Estimating Cycles (23:56)

  14. From 8086 to x64 (26:21)

  15. 8086 Simulation Code Review (33:05)

Part 2: Basic Profiling (4 hours, plus homework)

In this part of the course, we learn about how to measure time, and instrument programs to automatically determine where time is being spent.

  1. Generating Haversine Input JSON (15:40)

  2. Writing a Simple Haversine Distance Processor (12:09)

  3. Initial Haversine Processor Code Review (29:22)

  4. Introduction to RDTSC (48:05)

  5. How does QueryPerformanceCounter measure time? (31:43)

  6. Instrumentation-Based Profiling (18:01)

  7. Profiling Nested Blocks (26:12)

  8. Profiling Recursive Blocks (30:44)

  9. A First Look at Profiling Overhead (18:37)

  10. Comparing the Overhead of RDTSC and QueryPerformanceCounter (13:00)

Part 3: Moving Data (currently in progress)

Using our knowledge from parts 1 and 2, in Part 3 we look at how data moves into the CPU, and how to estimate the upper performance limits of our software imposed by the need to move data.

  1. Measuring Data Throughput (21:54)

  2. Repetition Testing (27:57)

  3. Monitoring OS Performance Counters (20:25)

  4. Page Faults (38:52)

  5. Probing OS Page Fault Behavior* (33:05)

  6. Four-Level Paging* (31:23)

  7. Analyzing Page Fault Anomalies* (31:44)

* Entries with an asterisk were “bonus” entries that can be skipped.

Part 3 is still in progress - more videos will be added here as they are scheduled. Additional parts will follow after Part 3 is complete.

1994 Internship Interview Series

  1. The Four Programming Questions from My 1994 Microsoft Internship Interview (19:02)

  2. Question #1: Rectangle Copy (24:50)

  3. Question #2: String Copy (14:50)

  4. Question #3: Flood Fill Detection (23:58)

  5. Question #4: Outline a Circle (1:09:01)

625
Share this post

Table of Contents

www.computerenhance.com
70
Share
70 Comments
Share this discussion

Table of Contents

www.computerenhance.com
Daniel V
Jan 28Liked by Casey Muratori

I became a paid subscriber solely for this course. I am SUPER stoked!!

Expand full comment
Reply
Share
1 reply
Max
Jan 27Liked by Casey Muratori

Can’t wait for this!

Expand full comment
Reply
Share
68 more comments...
Top
New
Community

No posts

Ready for more?

© 2023 Casey Muratori
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing