Monday Q&A #16 (2023-06-19)
Answers to questions from last week's videos
Each Monday I answer questions from the comments on the prior week’s videos. Transcripts are not available for Q&A videos due to length. I do produce closed captions for them, but Substack still has not enabled closed captions on videos :(
Questions addressed in this video:
[00:09] “Is the frequency of 'clocks' measured by rdtsc a constant or can change over time a program is running?”
[06:35] “If a CPU is in lower frequency mode, then the bus and the memory controller still perform in their fixed, distinct from CPU frequencies, doesn't it?
I can imagine a situation, where *CPU* is the bottleneck in low frequency mode, and *memory access* is the bottleneck in a boosted mode. Is it something relevant, that we should be aware about?
What are the ways we can adjust our benchmarks for changing frequencies?”
[09:35] “How does x86 memory consistency model handles RDTSC?
If I have RDTSC, ADD, ADD, RDTSC, can x86 CPU execute it as RDTSC, RDTSC, ADD, ADD?
How do you usually set up your microbenchmarks? Do you issue any fences after/before the RDTSC to "disable" OOO (Out Of Order execution) and make sure your microbenchmarks are accurate?
I can imagine such combination with RDTSC with fences could be too intrusive and we don't want to disable OOO.”
“Since RDTSC doesn't flush the CPU pipeline, do you consider it best practice to pair it with a cpuid instruction or something like that, or do you just not worry about it being executed in parallel with some of the surrounding code?”
[17:25] “Since rdtsc is not a per process clock wouldn't os scheduling/preemption impact the timing results when using it (especially when measuring longer periods of time)?”
“A multitasking operating system can interrupt your code and run any number of things before returning, can’t it? How do you account for this when converting from cycles to seconds? (What happens if the OS interrupts your program and runs something else in between the two calls to RTDSC?)”
[21:31] “Question about answer on alignment: You said that cores communicate via cache line. Which cache? Level I, Level II, Level III?”
[28:58] “if rdtsc is just a 64bit register counter how to we protect against overflow ?”
[32:20] “I think I'm confused on how the Operating Systems are implementing QueryPerformaceCounter and equivalents. How can the OS guarantee that a certain amount of time has passed if a particular core is running faster or slower? Would the OS not need to calibrate itself first using something in the CPU?”
[34:36] “Why not use the operating systems' timers such as get_clocktime() etc?”