With a little bit of work, we can turn our RDTSC-reading function into a simple instrumentation-based profiler.
This is the sixth video in Part 2 of the Performance-Aware Programming series. Please see the Table of Contents to quickly navigate through the rest of the course as it is updated weekly. Most of the listings referenced in the video (listings 75, 77, 78, 79, and 80) are available on the github. Listing 76 will be available on the github next Monday, once everyone has had a chance to do the homework without spoilers.
Over the previous two posts, we learned about the time-stamp counter. We used it to take measurements in our own program, and figure out how much time had elapsed in each part. This gave us our first picture of what was actually causing our program to run slowly.
This method — adding code to a program in order to gather performance metrics — is called instrumentation-based profiling. We call it this because we are instrumenting the actual binary that the CPU runs. It is no longer running the same program as before. It’s running that program plus additional code to gather metrics we care about.
We instrumented our code manually, but you don’t have to be manually adding instrumentation for it to be considering instrumentation-based profiling. You could alternatively use a tool, or an option in your compiler, to insert this kind of profile gathering into the program automatically. That would still be considered instrumentation-based profiling.
There are other ways of profiling. Sampling-based profiling is a method of profiling that doesn't touch the original code. It relies on things like interrupts and hardware-assisted metric collection to analyze the performance of an unmodified program.
Both techniques have their place, and we’re going to look at both in this course. We’re going to look at instrumentation first, however, because it’s the easiest one to understand, and the easiest one to write yourself.
What I'd like to do now is take a look at how I instrumented my code in the Introduction to RDTSC homework, since we haven't actually gone over that yet. I’d like to show you some problems with it, propose some things that we could do better, and explain why we might care about doing them better.
Here's listing seventy-five, which has my answer to last week's homework. If we go down to the actual main function, you can see that I've created a bunch of profile counters: