This seems like it should be so obvious but for some reason has never really occurred to me: splitting tasks into smaller tasks that can be pipelined improves throughput.

I mean, obviously! That's why CPUs have a pipeline for decoding and executing instructions.

I'm pretty sure I studied this during University but somehow it didn't stick with me because I ended up doing web development in Python and Javascript where no one cares about how the system works under the hood.

So now, given this picture, perhaps the job of a system engineer (or at least one aspect of his job) is to come up with a way to split complex tasks in the system into a series of smaller tasks that can be processed in parallel, have a queue for each task, and assign one CPU Core to it.

So if you have a complex task T that needs to be applied to N items, we can split it into 4 steps, T0, T1, T2, T3 and have each core handle one subtask: take an item from the Ti queue, apply Ti, pass the object to T[i+1] queue.

Hmm. I wonder if memory caching issues introduce complications to the dependency chains. Assuming a shared L2 cache, the system should work fairly well assuming each subtask can start and finish in a timely manner with no delays in moving items from one step in the pipeline to the next.

(Sorry for the all over the place mumbling comment; just thinking out loud).

Expand full comment

here is a video that shows a real world example of this pipelining idea, and how to increase throughput by doing many things at the same time: https://www.youtube.com/watch?v=CPaNNiB2H-s

Expand full comment