50 Comments
Aug 11, 2023Liked by Casey Muratori

Difficult to express in words how blown away I am. It was magical to see all those insights and how everything ended up.

Expand full comment
Aug 17, 2023Liked by Casey Muratori

Just wanted to say that I thought this particular problem overview was incredibly insightful. The digital differential analyzer was new to me, and drastically simplified the algorithm. I loved your presentation and willingness to be clear and thorough. Thank you!

Expand full comment
Aug 13, 2023Liked by Casey Muratori

Here's a version that produces an image that you can image with most image viewer: https://paste.sr.ht/~tomleb/0db9f7bf1b650670299ac3de1997dc2b15eb7000.

Run something like this: make main && ./main > circle.ppm && xdg-open circle.ppm

The PPM format is incredibly simple to produce for this kind of thing. For example, the letter J looks like this (taken from wikipedia):

P1

6 10

0 0 0 0 1 0

0 0 0 0 1 0

0 0 0 0 1 0

0 0 0 0 1 0

0 0 0 0 1 0

0 0 0 0 1 0

1 0 0 0 1 0

0 1 1 1 0 0

0 0 0 0 0 0

0 0 0 0 0 0

Expand full comment
Aug 12, 2023Liked by Casey Muratori

Amazing video! I didn't even realize an hour had passed until I reached the end. I was glued to the screen the entire time, except for a few moments when I paused to think for a bit.

Expand full comment

Makes me wonder if the demoscene folks on the amiga did their circles like this! Always wondered how they could do that!

Amazing video as always. That hour went by fast!

Expand full comment
author

Since I have never seen this particular formulation before, I don't know if any demoscene people would have been using it exactly... but they would likely have been using some kind of related DDA-style technique, like one of the more common formulations (Bresenham circle, midpoint circle algorithm, etc.)

- Casey

Expand full comment

That was one of the effects i never figured out :)

Example here: https://youtu.be/DQJa2xsQSaw?t=133

It amazed me with that 68000 at 7,14 Mhz. Got to try it now that i've got the algorithm!

Expand full comment
author

Those are filled circles, though, so it's going to be a slightly different situation. Also it depends how you wanted to draw them - one way is to use the CPU, if that was your goal, and another way is to use the blitter. One could also imagine using both, filling different parts of the circle with each.

Actually, in a demo, I could even imagine using the copper list for such things as well (to fill solid interiors on 8-pixel boundaries), but I haven't programmed an Amiga in 30 years so I would have to go digging through the manuals to relearn the specifics of each of these things :)

- Casey

Expand full comment

Probably adjust the algorithm to only draw one start and one end point on each row and use the blitters line draw mode! (http://amiga.nvg.org/amiga/reference/Hardware_Manual_guide/node0122.html)

The copper had too low x resolution iirc? As you said, it was 30 years ago :)

Expand full comment
author

It has low X resolution, but the CPU and/or blitter are also in play. So my thinking was, perhaps you fill just the edge pixels to cover the 8-byte boundary (or whatever it is), and then you put a copper list entry in to fill the interior. I mean if you wanted a "maximum speed circle", I could image using all three chips... kinda makes me what to try this myself :)

- Casey

Expand full comment

Great video and great write up! Very much enjoying the course. Thanks!

Expand full comment

A bit of unrelated question to the topic. The videos on your yt channel have the comments disabled. Is there a reason for that? I'd assume that a large portion of the comments would be unhelpful if not actively counterproductive, but wouldn't they also invite at least some valuable and helpful discussion? I would be interested to hear your thoughts on that.

Expand full comment
Aug 12, 2023·edited Aug 12, 2023

I really love problems like this where some up-front math on the blackboard can simplify the algorithm a lot, I have never seen or thought about this way of drawing circles before, definitely interesting. Could you comment on how this algorithm would perform today on modern hardware, should I consider going for an approach like this if filling out a bitmap on the CPU-side (not talking about a shader actually doing this)? I guess the explicit computation using sines and cosines and floating point arithmetic is really fast today and would be easier to implement with SIMD and that the branching version of the algorithm you proposed would suffer due to the branch predictor guessing wrong a substantial amount of times?

Expand full comment
author

Scatter-write algorithms like this don't tend to be good choices on modern CPUs which do not have high scatter throughput, so it is unlikely that you would ever want to use DDA-style algorithms for drawing anything on a CPU today.

- Casey

Expand full comment

I was a bit confused by your explanation for case 2 in your insight, but after thinking about it more I think its because you misspoke slightly. I think the point is not that we turn the right side slightly positive so that its even more larger than the left side, but rather that by negating the left side it goes from very negative to very positive and by not touching the right side it remains slightly negative. This means that the above <? comparison is false so we choose the right point.

Expand full comment
author

Yes - I will go back and listen to the video but I may well have said that totally wrong.

- Casey

Expand full comment
Aug 11, 2023·edited Aug 11, 2023

I agree, I think case 2 is wrong. But depending on what problem you're solving, it may not matter. In the end, we see that Casey uses integer coordinates for the grid of pixels, in this case, it looks like we're always in case 1 or 3. To have case 2, we would have to use floating-point coordinates, and we would have to start a step with a coordinate that is somewhere on the right side of a pixel.

Expand full comment
author

I have updated the video. For the hopefully-more-clear-and-convincing explanation, here is a link to the correct timestamp: https://youtu.be/JtgQJT08J1g?t=1858.

- Casey

Expand full comment
author

Actually, I am fairly confident that all the cases work. I have recorded a patch where I don't screw up the explanation, and will replace the video. Hopefully the new explanation will be convincing :)

- Casey

Expand full comment

It does sounds more convincing to me!

Expand full comment
Aug 12, 2023·edited Aug 12, 2023

Thanks a lot, I understand with the new explanation.

That's also what Ivy above explained incidentally, I just didn't get it the first time. The update helped!

Expand full comment

I loved it, your explanation was perfect.

Something pops in my mind though:

I presume a lot of old source code were doing this type of computation, whereby looking at the code, you wouldn't be able to grasp what was in the programmer's mind. My question is,

"If efficiency is mandatory, and you have to make the algorithm reduced to

the delta mindset, like here. How would you make it more readable?"

Is it by writing in a comment the mathematical formula? By telling each steps you've made in this particular problem?

---

Axel

Expand full comment
author

Yes, when I do stuff like this that is _very_ complicated (I don't consider DDAs to be complicated, because they're common in graphics), I typically have a giant multi-page comment at the top of the file that explains the entire algorithm and how it was derived. This is as much for myself as for other people, because subtle things are easy to forget, even if you were the person who figured them out in the first place :)

As an example, the bspline solver I wrote at RAD for the character animation system begins with around 200 lines of comments explaining how it works and how the notation in the file maps to the mathematical representation of the operation.

- Casey

Expand full comment

Great explanation, thank you!

A sidenote, if you don't do analysis of 3 cases and just squre both parts of equation you end up with the same result.

Expand full comment
author

Can you upload that derivation somewhere so I can look at it?

- Casey

Expand full comment
Sep 12, 2023Liked by Casey Muratori

That's how I did it

https://raw.githubusercontent.com/pankkor/pap/master/src/interview1994/draw_circle_derivation.jpeg

We still have to evaluate sign of (1-2x) when we divide by it. Since we are dealing with the first quadrant x > 0, hence 1-2x < 0.

Expand full comment
author

That is cool - I like the idea of using the knowledge that X is 1 or higher to know the effect of the division.

- Casey

Expand full comment

Just for clarification: Is this algorithm pixel accurate? Because I don't see any 0.5 bias / offset in the calculations. For example, a circle with radius 0 (point) would start at x=0.5 and with radius 5 would start at x=5.5. Then to continue with integer calculations, everything is multiplied by 2 or so.

Thanks again, Casey, for showing us this simple algorithm that looked difficult at first. As always, very entertaining and educational to watch your videos. We need more professional programmers like you to teach us *good* and *performance-aware* programming.

Expand full comment
author

Although I have not analyzed it extensively, I believe the algorithm is pixel-accurate. But it treats pixel "centers" as being the whole integer coordinates (as was traditional for fast integer rasterization, I believe). If you would like pixel centers to be at {+0.5, +0.5} instead, you would have to rework it.

- Casey

Expand full comment

Hi Casey, thank you for the really nice derivation. It was very good to follow.

I have played around a bit with your version an compared its performance to the one I use in a toy rasterizer. (Found here: http://members.chello.at/~easyfilter/bresenham.html)

Interestingly enough, your version starts to become slower around a 200 pixel radius. I don't really understand why, given the branching and more arithmetic operations in the other one.

Does it have to do with it plotting only 4 Quadrants while using more iterations (Less memory accesses)?

Keep up the amazing work!

Expand full comment

OK, should have tested better.

I timed the functions first while they were drawing to a buffer which got displayed via ResizeDIBSection in a loop. (With processing window messages, clearing the buffer)

Now I wrote a minimal version in which both versions just draw 10000 circles to a buffer.

In this version your variant is consistently 30 -50 % faster.

I guess I don't really understand the implications of the windows stuff regarding the execution of these functions.

Expand full comment
author

I am happy to look through any of this stuff if you're curious (we can control for caching and analyze the ASM and see what's going on in there, etc.), but, as I mentioned in the video, modern machines aren't designed to operate this way. They really really REALLY hate things like repeated scatter-writes, which is what DDA-style algorithms do. So in general, perf-profiling this sort of thing on a modern machine is not useful, because you will always totally destroy this kind of algorithm by using a tiled batched rasterizer instead (eg., one that works on, say, 64x64 squares and rasterizes all shapes into that square before moving on to the next square).

- Casey

Expand full comment
Aug 16, 2023·edited Aug 16, 2023

Thanks for your comment.

In my programming journey I tried to work through things

in a way tied to the evolution of the technology. For me this makes understanding the inner workings of the system much easier, since the older ones are a lot less complex. (With the 8086 part of the series I felt right at home :) ). So after toying around a lot with asm on dos, I kind of need a "smooth" transition to the complexity of modern, highly parallelized systems. In my understanding that is what you will cover mostly in the future of the series, so no worries there.

I guess it's time to go back to the future...

Expand full comment
author

That's what we're getting into in Part 3, in fact, so the first part of that is what we're doing for the next few weeks.

- Casey

Expand full comment

Wow, this is crazy cool. You did accidentally write a "-4" instead of "4" for F's delta at the very end, haha. How often do you find problems can be simplified by this "DDA" approach?

Expand full comment
author

Not sure what you mean... -4 is correct? It is not 4.

- Casey

Expand full comment

Argh, right you are. I got the subtraction order confused.

Expand full comment

Great video, thank you. I came here because I wanted to ask this one question:

Before you cleared the board after you moved out the constants into C, why didn't you divide by 2?

That would make it -x^2 +x -y^2 + C < 0 with C now r^2+ 1/2

with all the further simplifications -2y-1 and so on?

Expand full comment
author

Since we have to use integer registers, we can never divide our equations unless everything is still an integer afterward. So the 1/2 that you've got there is the thing that stops us.

- Casey

Expand full comment

Wow, this is just the right course, thank you for doing this. I hope you do not think I am playing the wise guy here.

My first thought was does it really matter, couldn't we just drop the one-half? It is really small against the r^2. Taking a small nap and thinking about what values we can have for x and y and that they are actually r*cos(x) and r*sin(y) and I thought since they are squared they are always 1, leading to just replacing the implicit formula:

-(x^2 + y^2) +x + C < 0

-r^2 +x +C < 0

now the r^2 cancel out.

x + 1/2 < 0

or if you don't like the 1/2 2x + 1 < 0.

We do not even need r and y. All depends on x which makes sense since we only calculate the first slice.

What do you think? Did I make a mistake here?

Expand full comment
author

I don't think I'm quite following what you are trying to do here... what is the goal of this simplification?

- Casey

Expand full comment

I am not sure, this seems extremely simple. So I am trying to code this and try if it actually works. Unfortunately I am really bad at C++. I will try in Rust which I am more comfortable with, though this might also be a good opportunity to get familiar with C++ a little. I try to summarise again, provided I understood this right -2x^2 +2x -2y^2 + C < 0 with C = 2r^2 + 1 is what we try to evaluate to determine if we take the right or the left pixel.

Now we know that x^2 + y^2 = r^2 so we can replace this which leads to

2x + 1 < 0

So it should be sufficient to evaluate 2x + 1 < 0 to make a decision.

But, having said the above, I am new here and don’t want to irritate anyone. I will try to code this and see what happens.

One thing that I cannot grasp right now, is, if the simplification is too much because of the assumptions taken on the absolute value. But coding this should settle it, so give me some time to see if it actually works.

Expand full comment
author

But that is only true if you have never moved from the initial point. X only equals R until you change X. Once you change X, that identity won't hold anymore. So I don't think you can do what you're doing here?

Another way to say this is, obviously the decision depends on both X and Y. If it didn't, it couldn't be a circle. So removing Y from the equation entirely means you've definitely done a substitution which only works for one particular Y value.

- Casey

Expand full comment

This is math, so I am rather week here and I might be totally wrong. But my thought was the circle is x^2 + y^2 = r^2. r is a constant and if I fix x y is determined. Well not fully, since y = sqrt(r^2 - x^2) and there a two solutions, but we got this covered via the genius symmetry you apply. I am currently on my mobile, so I cannot code this right now, and it is rather late here in the UK, but once at home I will try to code it, and see if it works.

Expand full comment

> thinking about what values we can have for x and y and that they are actually r*cos(x) and r*sin(y) and I thought since they are squared they are always 1, leading to just replacing the implicit formula

The point of the implicit formula is to *test* if the (x, y) coordinate is on the circle (or is outside or inside of it). Your substitution *assumes* (x, y) is on the circle, so isn't valid here.

Expand full comment

Indeed, the simplification leads to a square, even dividing by two already deviates a lot from the actual circle.

Here is a working code example in Rust. Both ideas were total failures, so at least the code should be useful:

```

fn main() {

let mut bitmap = [[0u8; 64]; 64];

// NOTE(casey): Center and radius of the circle

let cx: i32 = 32;

let cy: i32 = 32;

let r = 20;

// NOTE(casey): Loop that draws the circle

{

let r2 = r * 2;

let mut x: i32 = r;

let mut y: i32 = 0;

let mut dy = -2;

let mut dx = r2 * 2 - 4;

let mut d = r2 - 1;

while y <= x {

bitmap[(cy - y) as usize][(cx - x) as usize] = 1;

bitmap[(cy - y) as usize][(cx + x) as usize] = 1;

bitmap[(cy + y) as usize][(cx - x) as usize] = 1;

bitmap[(cy + y) as usize][(cx + x) as usize] = 1;

bitmap[(cy - x) as usize][(cx - y) as usize] = 1;

bitmap[(cy - x) as usize][(cx + y) as usize] = 1;

bitmap[(cy + x) as usize][(cx - y) as usize] = 1;

bitmap[(cy + x) as usize][(cx + y) as usize] = 1;

d += dy;

dy -= 4;

y += 1;

// NOTE(casey): Branchless version

let mask = d >> 31;

d += dx & mask;

dx -= 4 & mask;

x += mask;

}

}

// NOTE(casey): Output the bitmap using roughly square elements

for row in &bitmap {

for &cell in row {

let (l, r) = if cell == 1 { ('[', ']') } else { (' ', ' ') };

print!("{}{}", l, r);

}

println!();

}

}

```

I kept Casey's comments so comparison is easier.

Expand full comment
Aug 11, 2023·edited Aug 11, 2023

For the case where we go up and left, the calculation of the digital differential analyzer should be D(x - 1, y + 1), no? I wrote it out and it gets the same result as D(x - 1, y) which is 4x - 4 so technically the resulting circle is the same.

Expand full comment
author

Not in this case, because we have already advanced D by the Y step, so we are only accounting for the X step.

- Casey

Expand full comment

Oh, right. Should had read the code before asking, thank you for the answer. Also the result is not the same which makes more sense, it was a mistake on my part.

Expand full comment
author

Right, it will be different because it is trying to do *both* DDA steps in a single step, which of course we could do if we wanted to make the algorithm have an "else" clause, but I don't know that there is an efficiency win there. I haven't played with other formulations yet, since I was even trying to find one in the first place, I just accidentally did :)

- Casey

Expand full comment