Profiling Recursive Blocks
Our simple nested algorithm didn't handle recursion. Is there a modification that would fix it?
This is the eighth video in Part 2 of the Performance-Aware Programming series. Please see the Table of Contents to quickly navigate through the rest of the course as it is updated weekly. The listings referenced in the video (listings 85, 86, 87, and 88) are available on the github.
At the end of the previous post we saw that recursive functions break our nested profiler, but we didn't investigate why. We saw “with children” results that were well over 100%, so we know that something went wrong — we just don’t know what it was.
In the past, there have been profilers like this that didn't handle recursive zones. They are still quite useful for profiling, so we could just move forward with our current design. We would make sure never to put profile blocks into recursive call graphs, and that’d be that.
But ideally, we would like to fix this problem, because it creates a future hazard for ourselves and other programmers who use the code. If you make “don’t profile recursive functions” a policy, you are requiring that anyone using the profiler know definitively whether there is any recursion at all in any block they mark up. It doesn't have to be heavy recursion, like a recursive tree iterator or something like that. Even a long chain of function calls that just happens to include a reentrant call somewhere will cause our current profiler to misreport its results.
This could be a very costly mistake, because the results may be wrong in ways that are not immediately obvious. In the previous post, we just so happened to profile something that showed a large, obviously-wrong percentage. We would not necessarily be that lucky all the time. In a less recursive scenario, we might see erroneous results that look quite plausible. This could cause us to make bad decisions about what to optimize, and we could end up wasting a lot of time and effort acting on bad profiling data.
This sort of thing is exactly what we are trying to prevent with profiling utilities. The whole point of a utility like this is to take guesswork out of performance analysis. The last thing we want to do is add guesswork back in!
So we’d really like to improve our approach if we can. The first thing we should do is figure out what's actually going on. What is wrong with the current profiler, exactly?
If you remember, we printed out two times for each profile anchor: first there was the TSCElapsed time minus the TSCElapsedChildren time. This was supposed to tell us how much time was spent in a particular profile block without counting any time spent in other block it might call.
Second, there was TSCElapsed by itself. This was supposed to tell us how much time was taken in the profile block in total, counting any other blocks that may have occurred during it.
When we looked at these times, the first percentage (TSCElapsed - TSCElapsedChildren) actually appeared to be correct. Or at least, it didn’t look obviously wrong. It was only the TSCElapsed value by itself that seemed obviously wrong.
That prompts the obvious question: is the first percentage actually correct? Do we only have a problem with the TSCElapsed value, but somehow the (TSCElapsed - TSCElapsedChildren) value is actually correct?
Let's take a look. Imagine a simple scenario that involves recursive use of a profile block: