*This is the first video in Part 2 of the Performance-Aware Programming series. Please see the Table of Contents to quickly navigate through the rest of the course as it is updated weekly. The reference haversine function (listing 65) is available on the github.*

Welcome to Part Two of the Performance Aware programming series! Before we begin, a quick reminder: Part Two requires you to be able to read assembly language. If you don't already know how to read assembly language, please go back and do the homework from Part One of the course, which is designed to teach you how to read and understand assembly language.

Throughout Part Two, we’re going to be looking at the Haversine problem I introduced just prior to Part One. In this problem, you have a sphere, and on the sphere you have points given by latitude and longitude coordinates.

I hate “latitude” and “longitude” because I never remember which is which, so I prefer to think of them as X and Y: X goes around the equator, and Y goes from pole to pole. This lets me think of it like a grid wrapped onto the sphere, and I don’t make mistakes about which direction is which.

The definition of the problem as we saw it was that X and Y will be given in degrees. Because X goes around the equator, it can range between -180 and 180 degrees. Y can only go from pole to pole, so it ranges between -90 and 90:

Given that, the Haversine problem statement is pretty simple: take pairs of points on the sphere and compute the average arc length between them. To do that computation, we are given a formula which we are told compute the value correctly.

Because this is a course on hardware performance, not a math course, we are going to *assume* that the mathematicians have done their job and supplied us with an efficient formula. But it is worth noting that, if this were a *mathematical* optimization course, we would also want to consider using different formulas. That would be a form of *algorithmic* optimization, and it’s very useful and very helpful. But it’s beyond the scope of this course, since we would have to become an extensive math course in order to interrogate that side of the problem effectively! Instead, we will restrict ourselves to the implementation of *this* specific formula, and work on implementing it as efficiently as we can.

To begin that work, we need an implementation to analyze. And before we can write an implementation, we need some *data* to feed it. So our very first task here in Part Two is to write a program we can use to generate a variety of inputs to our eventual Haversine distance processor.

If you remember from the original Haversine post, we would like our problem to resemble a real-world scenario. So we’re going to have to read our data from disk or from a network, and we’d like it to be in JSON format. I chose JSON specifically because it is a significant impediment to performance, so it provides a good opportunity for us to look at all the issues you may face in the real world when trying to process data efficiently — you are often given constraints on the inputs or the outputs to your programs, and you have to work around them as best you can.

Ideally, we’d like something that we can use to generate arbitrarily large JSON files filled with these Haversine point pairs. We want to be able to generate small datasets, like 10 or 1000 points, that we can use for basic testing. But we also want to be able to generate huge datasets, like 10,000,000 points or 1,000,000,000 points, for testing how the performance of our code scales with the number of input points.

That’s what I’d like you to do for today’s homework. It’s a pretty straightforward program to write, and you can do it in any language, because it just needs to be able to output a JSON file. But there are some details you may want to keep in mind.

First, we would like the JSON file to comply with the JSON standard. This means that the array of point pairs should have something like this basic format: