Each Monday I answer questions from the comments on the prior week’s videos. Transcripts are not available for Q&A videos due to length. I do produce closed captions for them, but Substack still has not enabled closed captions on videos :(
Questions addressed in this video:
[0:00:24] “When allocating memory, you can allocate it so that it’s aligned (e.g., by using _align_malloc or aligned_alloc). What is this alignment, how does it relate to the registers of the processor, and why would somebody use this? Is it required or desired for SIMD?”
[0:25:56] “Off topic question: is there a way with C to find the amount and levels of cache on your processor? Can we write code that dynamically adapts to these different amounts? I have the same questions for the SIMD instructions available. When you write code, do you assume you know these aspects of the processor, or do you write code that can adapt based on what processor is getting used?”
[0:37:23] “Maybe its off-topic for the course but I don't think I fully understand how the answer file works and is being parsed. It contains each 64bit haversine_distance calculation, correct?
So this line is just printing the last distance that was written out to the file?”
[0:38:42] “Could you talk more about the MinimumJSONPairEncoding? Is it 24 bytes? How did you calculate that?”
[0:42:46] “Is there a reason to prefer `stat` vs `fseek` with `SEEK_END` followed by `ftell` (which returns a `long`)? Does `fseek` mean the OS has to actually read the file multiple times?”
[0:46:34] “I noticed that you always use a pointer to the start of some data and a count, vs eg a pointer to the start and end of a token, or ever having a mutable `char*` stream (these are things that Per Vognsen uses in his Bitwise series, for example). What are the tradeoffs of these approaches?”
[0:49:30] “What is the benefit of explicitly tokenizing and then parsing vs trying to directly parse, either skipping tokenization entirely or expressing the tokenization as functions rather than data?”
[0:54:56] “Is the semicolon token a typo?”
[0:55:14] “JSON strings are specified to be a sequence of unicode characters. How would that affect the implementation of a standards-compliant parser, if at all?”
[0:56:48] “A standards-compliant JSON parser would need to handle escape sequences in strings, including in key names. For example, `{"\u0041B": 5}`, `{"A\u0042": 5}`, and `{"AB": 5}` are equivalent. How would you handle that requirement when doing key lookups?”
[1:00:07] “In my parsers, I produced `f64` values and unescaped strings at parse time. Is there a reason to prefer doing so in a second pass with eg `ConvertElementToF64`? Naively, it seems like it would be preferable to only look at the source text once.”
[1:02:11] “It looks like the `ParseHaversinePairs` function will silently produce invalid numbers if the source file doesn't conform to the expected schema (for example, if one of the "x0" elements is not actually a number). To catch that kind of problem, would you suggest adding an element type to `struct json_element`, and setting that during the parse phase + checking during `ParseHaversinePairs`?”