Efficient DDA Circle Outlines

When I went back to solve interview question #4, I found that I ended up with a more efficient version of the algorithm than the ones I've seen published.

Aug 11, 2023

Modern PC hardware is very different than the desktop PC hardware that was available when I did my Microsoft internship interview in 1994. In fact, only a few years after that interview, GPUs became increasingly common, and few people still focused on “Bresenham”, “DDA”, or “Midpoint” circle-drawing algorithms. I certainly hadn’t thought about this kind of circle algorithm for at least two decades!

But coming back to it as part of this “intern interview questions revisited” series, I was forced to work through the problem again. Given the benefit of many more years of experience, I found that when I went to solve it, I actually ended up with a more efficient algorithm than the standard “midpoint circle algorithm” that the interviewer had wanted in 1994.

In fact, I can’t seem to find my new version of the algorithm on-line anywhere. So, completely by accident, this post is no longer really about that old interview question. Instead, it’s about a more straightforward and efficient derivation of a DDA circle-outlining algorithm.

Now, why anyone would want such an algorithm these days, I have no idea. It doesn’t seem very useful on its own, since modern graphics hardware can’t really benefit much from DDA (at least not in most circumstances). But the concepts used to arrive at the efficient solution are quite useful to know —both for other graphics problems and for non-graphics problems. So I do think they’re well worth learning, and this circle outlining exercise is a great excuse to learn them if you don’t know them already:

Since the video is quite long, I’m not going to transcribe it here. If you’d like the full derivation, you’ll have to watch the video. However, if you’re just interested in the final algorithm, it’s incredibly simple.

The update for each step is first an ADD, a SUB, and an INC:

D += dY;
dY -= 4;
++Y;

Then optionally, if the D value is negative, another ADD, SUB, and INC:

// NOTE(casey): Branching version
if(D < 0)
{
    D += dX;
    dX -= 4;
    --X;
}

That’s it. The dY and Y updates are also decoupled from the conditional D, dX and X updates, so if necessary the conditional can be moved up to right after D += dY for efficient use of the flags register.

Also, the conditional can be removed if desired. Replicating the sign bit (which can be done with an arithmetic shift or the CDW instruction, etc.) produces the necessary mask to rewrite the X step as a branchless update:

// NOTE(casey): Branchless version
int Mask = (D >> 31);
D += dX & Mask;
dX -= 4 & Mask;
X += Mask;

To the best of my knowledge, this derivation is more efficient than the standard published Bresenham or midpoint circle formulations I found on-line. All of them seem to require more operations. In the branching version, mine requires only 3 to 6 adds — that’s it. The standard algorithms I’ve seen listed tend to require 9 or more.

Also, my version requires only one more add than “Jesko’s Method” — which I had never heard of until I looked at Wikipedia — but my version actually produces a real circle. Jesko’s Method, when I analyzed the actual pixels it plots, does not seem to plot a real circle. It plots something that looks like a circle, but the pixels chosen are not the least-error points on the pixel grid. As far as I can tell, it is more of an “approximate” circle-outlining algorithm. If someone out there believes it is actually plotting a valid circle, please let me know what I’m missing so I can reanalyze it!

Finally, my version does allow you to trade off performance and register pressure. The simplest 6-add version above uses perhaps too many register for certain implementations — but as you saw if you watched the video, you can instead choose to use less registers in exchange for a shift:

D -= (Y<<2) + 2;
++Y;

if(D < 0)
{
    D += (X<<2) - 4;
    --X;
}

This is certainly less efficient, so presumably you would never do it this way unless register pressure required it.

As a final caveat, I should of course note that my entire excursion into this algorithm began and ended yesterday. So I may well have missed something. I did basic testing on the generated points to verify that, for a few circle parameters, my new version fills precisely the same pixels as a reference Bresenham, and that the pixels it selects are the closest to the circle.

But that said, it could definitely stand more scrutiny. Of course, as I mentioned before, I’m not sure why anyone would bother to apply that scrutiny. Nobody needs to draw circles this way anymore anyway! But, if anyone should happen upon a mistake, or a missed opportunity for improving the algorithm, please do mention it in the comments so I can amend this post.

To that end, if you’d like to play around with the algorithm, a complete program to draw a circle to the console follows. It contains both the branching and branchless versions, which you can toggle with the #if:


#include <stdio.h>

int main(void)
{
    char Bitmap[64][64] = {};

    // NOTE(casey): Center and radius of the circle
    int Cx = 32;
    int Cy = 32;
    int R = 20;
    
    // NOTE(casey): Loop that draws the circle
    {
        int R2 = R+R;
        
        int X = R;
        int Y = 0;
        int dY = -2;
        int dX = R2+R2 - 4;
        int D = R2 - 1;
        
        while(Y <= X)
        {
            Bitmap[Cy - Y][Cx - X] = 1;
            Bitmap[Cy - Y][Cx + X] = 1;
            Bitmap[Cy + Y][Cx - X] = 1;
            Bitmap[Cy + Y][Cx + X] = 1;
            Bitmap[Cy - X][Cx - Y] = 1;
            Bitmap[Cy - X][Cx + Y] = 1;
            Bitmap[Cy + X][Cx - Y] = 1;
            Bitmap[Cy + X][Cx + Y] = 1;
            
            D += dY;
            dY -= 4;
            ++Y;
        
#if 0
            // NOTE(casey): Branching version
            if(D < 0)
            {
                D += dX;
                dX -= 4;
                --X;
            }
#else
            // NOTE(casey): Branchless version
            int Mask = (D >> 31);
            D += dX & Mask;
            dX -= 4 & Mask;
            X += Mask;
#endif
        }
    }
    
    // NOTE(casey): Output the bitmap using roughly square elements
    for(int Y = 0; Y < 64; ++Y)
    {
        for(int X = 0; X < 64; ++X)
        {
            char L = ' ';
            char R = ' ';
            if(Bitmap[Y][X] == 1)
            {
                L = '[';
                R = ']';
            }
            printf("%c%c", L, R);
        }
        printf("\n");
    }
}

If you like this sort of detail-oriented exploration, that’s exactly what we do here on Computer Enhance. Next week, we’ll be resuming out Performance-Aware Programming Series, which is designed to teach programmers how to think about, evaluate, and improve software performance. If that sounds interesting to you, you can check out our subscription options right here:

Santiago Cabrera

Difficult to express in words how blown away I am. It was magical to see all those insights and how everything ended up.

Expand full comment

Tim

Aug 17, 2023

Just wanted to say that I thought this particular problem overview was incredibly insightful. The digital differential analyzer was new to me, and drastically simplified the algorithm. I loved your presentation and willingness to be clear and thorough. Thank you!

48 more comments...

Computer, Enhance!

50 Comments