MSVC PDBs Are Filled With Stale Debug Info
Tracking down a debugger bug lead me to the surprising discovery that recent Microsoft linkers amass giant piles of stale data in PDBs, even when all incremental build options are disabled.
This week I encountered something bizarre: while inspecting data in a debugger, I was presented with a layout for a structure that was at least two days old. I know that sounds strange, but that is what happened.
Earlier in the week, I had been using a struct with five separate members in it. I then changed the struct to a new version with four completely different members and an array. Finally, on Friday, I happened to inspect the struct in the watch window.
To my surprise, it showed the old layout.
Naturally, my first assumption was that I’d messed up the build, and hadn’t built with the new headers. This seemed impossible, because the program seemed to be working properly, but what other explanation could there be?
After checking everything, I determined that I was, in fact, building with the correct headers. The ASM even seemed to be using the correct offsets for accessing the struct members. It was only the debug display that was wrong.
I tried a different debugger, and that debugger worked properly. It showed the current layout, like it should!
This was truly puzzling. It shouldn’t have been possible! How are two debuggers, both using the same debug information, displaying two totally different layouts for a structure, both of which were real layouts of that structure at different points in time?
The most obvious answer would be an incremental build bug. If somehow your program (or debug info, in this case) seems to contain stale data, the logical first suspect is always incremental compilation or linking. Since incremental build stages by definition read old data and only partially update it, it makes sense that a bug in that process could cause old data to erroneously persist alongside more recent data.
However, as far as I knew, this couldn’t be the explanation, because I always disable incremental builds! In fact, I disable them specifically so I don’t encounter problems like this. My MSVC build parameters always include
for the compiler and
for the linker. That’s supposed to disable all incremental build steps, so everything gets written fresh. So it couldn’t be an incremental build problem, right?
Debugging the Debugger
Bugs in the debugger were to be expected: it was, in fact, an early alpha build of the RAD Debugger. I’d been testing this debugger for a little while, but I had not posted about it because the developers asked me not to publicize it. Understandably, they were trying to keep it quiet until they had a few weeks to work out the basic issues that inevitably crop up when external users start using a previously-internal program.
However, completely unprompted, another developer who happened to be browsing around the Epic github stumbled upon it. They figured out what it was, and posted about it. Because a significant portion of the developer world is desperate for better debuggers, news of the project spread quickly, and now the cat is unfortunately all the way out of the bag:
So much for that!
Anyway, that brief digression was my roundabout way of explaining that I can talk about the debugger publicly now without making life harder for the developers. So I am.
But, if you’re reading this, and this is the first time you’ve heard of this project, please try to be as helpful as you can to the devs. They did not want this much attention this soon, and they are going to be swamped with bug reports and feature requests. Treat it like what it is — a very early alpha — and expect lots of bugs and missing features until it gets to beta.
That said, on with our story.
I was not familiar with the RAD Debugger codebase yet. I had been using prebuilt binaries sent to me directly by the developers. But, since the entire source was on github, I thought I might try my hand at building and debugging the symbol processing code. I figured if I could find the bug — or at least narrow it down — it would avoid piling a potentially elusive repro problem on the developers during a time when I was sure they were already fielding lots of issues.
Normally, when I need to build an open source project, I take a few moments and prepare myself for the pain and horror that is about to be inflicted on me. I assume I’m going to have to install seven build utilities, tons of dependent libraries, the exact right compiler, etc., etc. I’m assuming I will have to wait five minutes for the excruciatingly long build to fail, and then do a bunch of Stack Overflow dumpster dives to see if anyone has already reported the same error message, so I can “fix” it without spending hours picking through the code myself.
To my surprise and delight, building the RAD Debugger was the polar opposite of this. It was by far the most pleasant experience I have ever had building an open source project. I downloaded the code from github as a ZIP with one click, unzipped it on my machine, and ran the included build.bat. It built instantly, and correctly.
If every open source project built this easily, I’d contribute to a lot more open source projects. I was happily stepping through the codebase less than a minute after clicking the download button.
Happily for me, the source code was also fairly sane. There aren’t a ton of weird objects or piles of indirected layers. The code pretty much does what it does in linear order, and it’s not hard to walk through it.
So, major props to the team at RAD for making it so easy to get started with this code! I imagine I will track down bugs myself very often now, instead of just reporting vague descriptions of them on the issue page. Everyone wins.
How Many Symbols Are In A Symbol?
After getting acclimated, I was able to add some debug code that would trigger any time the PDB processor encountered the name of the erroneously-displayed structure. I figured I could see what it was getting wrong about the struct layout by stepping through the symbol construction.
However, when I started stepping through the processing code using this method, to my surprise, I saw it process a PDB entry for the offending struct name not once, not twice, but five separate times, each with a different layout!
Looking more closely, it became obvious what was happening. Each time I had changed the structure layout, rather than replace the definition of the structure in the debug info, MSVC had appended a new one instead — even though I had all incremental build options turned off!
To be honest, I had never considered that this might be happening. Naively, I had assumed that disabling the incremental build options in the compiler would mean that both the executable and the debug symbols for my program would be completely rewritten on each build. That was the behavior I wanted, and it’s what I was expecting.
In retrospect, there was no real reason for me to make this assumption. Incremental compilation, and incremental linking, are about producing the executable — not the debug symbols. There are no switches (that I know of) which control incremental debug information.
Furthermore, the files are called “PDBs”, which stands for “Program Database”. Though not a rigidly defined term, a “database” is often something that is written to incrementally, rather than all at once. So really, you could argue that it makes perfect sense that these files are reused by the compiler even when incremental builds are disabled.
Once I saw that PDBs — even in non-incremental builds — have tons of stale debug information in them, the debugger bug pathology was immediately obvious: it was assuming there would only be one type in the PDB that matched a given type name, and was selecting an old entry in the PDB instead of the most recent one. Mystery solved.
But that (now trivial to diagnose) bug was no longer the interesting part of the investigation! Instead, I now wanted to know: if PDBs contain tons of old symbol data, just how much time and space are we wasting processing and distributing bloated PDBs1?
Delete PDBs Before Building?
After sending the bug explanation to the RAD Debugger devs, I set out to test just how bad this “many-for-one” symbol definition problem was. To start with, I deleted the PDBs for my project and rebuilt it.
To my horror, the PDB size shrunk almost in half. Apparently, close to 50% of the space in the PDB was not being used by any actual debug info. It was — perhaps in keeping with a sparse database storage paradigm — liberally intermixed with stale data that didn’t actually need to be there.
So, new lesson learned: always add a PDB deletion step to the beginning of your build if you care about the size of the resulting PDBs!
But I also wanted to be able to reproduce this behavior more directly. To test the hypothesis that MSVC (randomly?) kept old versions of modified structures, I created a ridiculous struct I could use to rapidly do lots of builds with modified versions of an identically-named type:
// NOTE(casey): These is intentionally no MEMBER9, so that there is a "no additional member" iteration
I put this struct in the simplest possible “main”, then wrote a horrible batch file to sequentially build many permutations:
for /l %%m in (0, 1, 9) do (
for /l %%n in (0, 1, 9) do (
for /l %%o in (0, 1, 9) do (
call cl -DMEMBER%%m=1 -DMEMBER%%n=1 -DMEMBER%%o=1 %flags% >nul
copy incremental.pdb incremental_%%m%%n.pdb >nul
When run, the batch file sequentially builds the program 1000 times, snapshotting the PDB every 10th build for future analysis. Providing strong evidence for the hypothesis, the PDBs do erratically grow in size — and not by a little:
In fact, after 1000 builds, the PDB is mostly stale data — it’s around a 2:1 ratio of old vs. current debug info!
However, it’s also worth noting that it doesn’t appear to always increase. If you look at sequential PDBs, it’s easy to find runs of identical sizes. Here’s an example, with alternating identical-size runs bolded:
So the PDB doesn’t always get bigger. Although I haven’t investigated further, it seems as if there is some reclamation of space used by old data, but only sometimes.
I have never really looked at the PDB format, or how Microsoft uses it, so I have no idea what they may be doing. It may be that this slow growth of unused data is because of some kind of freed-space storage fragmentation. Or, it may be because of some kind of deduplication scheme, or hash-based storage.
Or, perhaps this behavior wasn’t intentional at all. Perhaps it’s not supposed to be doing this, and nobody at Microsoft noticed? As it turns out, there’s reason to suspect that may be the case…
A Second Bug?
The developer at Epic who built much of the UI for the RAD Debugger was Ryan Fleury (who also happens to write the Hidden Grove Substack, BTW!). He was the one who originally sent me the RAD Debugger to try, and I had been corresponding with him about this bug as I was hunting it down because we both found it so unusual. When I conclusively determined that MSVC was storing old data in the PDBs, I uploaded the incremental build test to github and asked Ryan if he was able to reproduce my results.
He both was and wasn’t. This turned out to be very fortunate! On Ryan’s machine with Visual Studio 2022 installed, he was able to reproduce it. On his machine with Visual Studio 2019 installed, he wasn’t.
Had I actually been hunting down two bugs, not one — the RAD Debugger bug and a bug in Visual Studio’s PDB update code? Was this a new regression that somehow nobody noticed? Have all our reused PDBs grown by 2-3x over the past few years, and we just thought the actual debug info was getting bigger?
That may very well be the case. Checking some of the specific versions of MSVC Ryan tested, he reports that 19.23.28105.4 did not exhibit the infinitely-growing PDB problem, but 19.28.29337 did (as did 19.38.33134, and the version on my machine, 19.29.30138). So it may be the case that sometime between 19.23.28105.4 and 19.28.29337, Microsoft introduced a bug in the PDB code that causes it to grow ad infinitum. Without access to their code, it’s hard to say for sure, but it’s at least a plausible explanation for why nobody noticed this before: it may only have started happening relatively recently!
Of course, to know for sure what’s going on here, we’ll have to wait until someone with access to the MSVC source code looks into this, or someone with more time to spare constructs a more precise set of tests.
Until then, however, I’ll be modifying my builds to delete all PDBs as the very first step!
Many developers keep regular archives of their PDBs on source control, as well as distributing them to other developers and end users to aid in crash analysis, so the size of the PDB affects both the space (due to bloating) and speed (due to the larger sizes causing more compression work and disk traffic) of these operations.