My battery life skyrocketed. I went for 11 hours straight once, no outlets to be seen. This was vastly superior than even the advertised battery life! I soon started chanting “Safari on battery, Chrome at home” in my mind. Soon, I was spending so much time on battery, I set Safari to my default browser. Incidentally, it is fascinating how much sticking power the “Default” browser has.

I began investigating how to switch to Safari for good. I’m not a huge extensions user, so that was an easy win; all the popular extensions are available for every major browser. Beyond that, I depended on the Chrome Password Manager to sync my logins across devices. 1Password takes care of that for me now. Flash is a mostly dead medium and if I do come across a Flash video, I can just fire up Chrome.

The last holdout might be my heavy use of the Chrome Developer Tools, but even then, I can just fire up Chrome when testing, like I have to do for Firefox anyway. For day-to-day usage, a browser is just a browser. For the most part, Safari gets the job done, and with my recent discovery of `⌘⌥R`

for responsive development, mobile development is much smoother than the equivalent interfaces in Chrome or Firefox. Now if only I could make everything play nicely under Windows, that would just about put the final nail in the coffin …

Now that I have a (very junior) seat at the table, I look up the food chain and am entirely stumped by the question “so what do you want to do?” What do I want to become? I’m terrified because I have no idea. All I can think of is that the game industry needs more role models. Offhand, there are people like John Carmack, Ken Levine, Sid Meier, *legends* in the game industry. But where are all the people in between? Where are the people that are the lifeblood of a studio, but that the games media neglected to instead fawn over the next “indie game developer superstar!”?

Hah! And I bet you thought this would be some mindless platitudes about reinforcing my credibility as a “professional” game developer! Guess what? *I’m just as clueless as you armchair game developers.*

To be fair, I have had some successes. Airlines have made flying a surprisingly paperless experience. In the space of four years, I’ve gone from running in sub-freezing temperatures across the quad to pick up my boarding pass, to checking my email for a convenient e-boarding pass that automagically pops up on my phone as I approach the security line.

More and more food trucks seem to be using Square card readers, and the software takes care of emailing me a receipt. I wish Amazon Register did the same. Anyway, as digital payment technology improves, my hope is that in the next few years, those spools of receipt paper will become a thing of the past.

As much as I hate to admit it, I am also starting to rely on iCloud Reminders as my primary to-do list, which sync across all of my devices. I still miss my vibrant shades of Post-it notes though. The lone holdout? Work. It seems writing on whiteboards and snapping a picture with my phone have only partially supplanted graph pads, where hand-drawn diagrams, scribbles, and Dash/Plus reign supreme.

]]>**For the impatient, the data first:**

I used the following devices to generate the above figures.

Apple MacBook Air (6,1)

- Intel Core i5-4250U @ 1.30GHz
- Intel HD 5000

Apple MacBook Pro (11,3)

- Intel Core i7-4980HQ @ 2.80GHz
- Intel Iris Pro 5200
- Nvidia GT 750M

As you can see, the OpenCL implementation absolutely destroys the reference CPU implementation. Granted, not much effort was put into optimizing the reference code. The point is to demonstrate the ability to significantly accelerate your code with very little time investment and effort.

The Mandelbrot Set is a subset of complex numbers defined by an equation that, when plotted on the complex plane, forms a fractal. One simple way of calculating the Mandelbrot Set is to iterate the series *z _{M}* until

To compare and contrast, we implement the Mandelbrot Set to run both on a traditional processor, as well as using Chlorine. To demonstrate, here are both prototypes for `solve_mandelbrot()`

. Other than the OpenCL memory tags and use of pointers, they are otherwise very similar.

```
void solve_mandelbrot(std::vector<float> const & real,
std::vector<float> const & imag,
int iterations,
std::vector<int> & result)
```

```
__kernel void solve_mandelbrot(__global float const * real,
__global float const * imag,
int iterations,
__global int * result)
```

The function `solve_mandelbrot()`

accepts a vector of real and imaginary points (representing *M _{i} = x_{i} + y_{i}* j), the number of iterations to run the series before assuming the number is in the set, and a vector to store the output. Look at both the kernel and host algorithm implementation and confirm for yourself that they are, in fact, identical.

```
float x = real[i]; // Real Component
float y = imag[i]; // Imaginary Component
int n = 0; // Tracks Color Information
// Compute the Mandelbrot Set
while ((x * x + y * y <= 2 * 2) && n < iterations)
{
float xtemp = x * x - y * y + real[i];
y = 2 * x * y + imag[i];
x = xtemp;
n++;
}
// Write Results to Output Arrays
result[i] = x * x + y * y <= 2 * 2 ? -1 : n;
```

The key difference here is that we must iterate through the entire image using a `for`

loop on the host, while in OpenCL, we can remove the `for`

loop altogether, and instead retrieve the index by querying the device.

```
// Host Code (Reference Implementation)
for(unsigned int i = 0; i < real.size(); i++)
// Chlorine Kernel (OpenCL Implementation)
unsigned int i = get_global_id(0);
```

Next we define some settings for the Mandelbrot Set we are going to generate:

- We set a maximum number of iterations at a large number to be near accurate.
- Since the Mandelbrot Set is completely contained within the unit circle of radius 2, we calculate only for the circumscribed square.
- The step value is set so the resulting image resolution is 1000x1000px.

```
// Define Mandelbrot Settings
int iterations = 10000;
float x_min = -2f;
float x_max = 2f;
float y_min = -2f;
float y_max = 2f;
float x_step = 0.002f;
float y_step = 0.002f;
```

Next, we turn the points in this range into a single vector of points to test, by linearizing the grid to make it easier to pass in. The stride is recorded so the result can be put back into a grid.

```
// Create Linear Vector of Coordinates
unsigned int stride = (x_max - x_min) / x_step + 2;
std::vector<float> reals;
std::vector<float> imags;
for(float y = y_min; y < y_max; y += y_step)
for(float x = x_min; x < x_max; x += x_step)
{
reals.push_back(x);
imags.push_back(y);
}
```

For the CPU, we first make an output vector to pass into the function. We then record the start time, call the function, then record the end time. By recording the start and end times, we can compare the time taken between the CPU and Chlorine, for a rough performance benchmark.

```
// Compute the Mandelbrot Set on the CPU
std::vector<int> cpu_ans(reals.size());
clock_t cpu_begin = clock();
solve_mandelbrot(reals, imags, iterations, cpu_ans);
clock_t cpu_end = clock();
```

We first create a worker that will load the OpenCL runtime and compile the kernel code. Note that calling Chlorine is very similar to calling traditional functions: the function name is passed to the `call()`

function instead of being called directly. Everything else is exactly the same.

```
// Compute the Mandelbrot Set Using Chlorine
std::vector<int> cl_ans(reals.size());
ch::Worker benoit("mandelbrot.cl");
clock_t cl_begin = clock();
benoit.call("solve_mandelbrot", reals, imags, iterations, cl_ans);
clock_t cl_end = clock();
```

As a matter of good practice, we compare the output to ensure that we are really getting the same answer. Note that due to floating point precision limitations, there may be a small degree of error between the host and OpenCL outputs.

```
// Compare the Output Arrays
unsigned int error_count = 0;
for(unsigned int i = 0; i < cpu_ans.size(); i++)
if (cpu_ans[i] != cl_ans[i])
error_count++;
```

From here, we write the image to the (very simple) PPM format, which you can display using ImageMagick, Adobe Photoshop, or similar. If all goes well, you should see the same image as the one at the top of this document.

If you were wondering, the five lines of code that differ between the reference and OpenCL implementations:

```
#include "chlorine.hpp" // the include statement.
ch::Worker benoit("mandelbrot.cl"); // worker creation.
void solve_mandelbrot(...) // function prototype (moved to kernel).
unsigned int i = get_global_id(0); // indexing inside a kernel
benoit.call(...); // the function call.
```

You’re probably thinking *“can I get away with doing a mass find/replace?”* What do you think I did? Yes, *it’s that easy* to use Chlorine. Give it a try today!

*_{A significant portion of this was written in collaboration with fellow Rensselaer student Christopher Brenon.}

As a student with limited resources, I prefer tackling a subset of embarrassingly parallel problems that I call *trivially parallel*. These are problems that benefit from parallelization without necessarily requiring supercomputing levels of hardware. A simple example might be matrix multiplication, or even more basic, swapping the contents of two arrays.

This is the point where I declare OpenCL a horrible mess. You can read more about the Quest for the Smallest OpenCL Program to get a sense of all the hoops you have to jump through just to do basic math. You could start with the 37 line example. It is quite basic, and difficult to expand upon. Not impossible, but there is an easier way!

I wrote Chlorine as a simpler way to interact with devices. The goal is for you to work with your data, not fight with hardware interfaces. How does it work? The following is a line-by-line explanation of the swap example on the project homepage.

Start by including the Chlorine header.

```
#include "chlorine.hpp"
```

Now we create some dummy data. While this example uses `std::vector`

for brevity, you can freely mix and match containers of any type. This can be useful if you need to mix bounded and unbounded array types.

```
// Create Some Data
std::vector<float> spam(10, 3.1415f);
std::vector<float> eggs(10, 2.7182f);
```

Next, we create a Chlorine Worker, using the filename constructor, which takes a path to an OpenCL kernel file.

```
// Initialize a Chlorine Worker
ch::Worker worker("swap.cl");
```

Now that our worker is aware of kernel functions, we can simply invoke `Worker::call(kernel_function, ... )`

with the first argument being the name of the kernel function you wish to call, followed by the same arguments (in the same order!) as the kernel function.

```
// Call the Swap Function in the Given Kernel
worker.call("swap", spam, eggs);
```

After this completes, data is automatically written back to the same memory locations allocated by your program.

```
// Host Containers Are Automatically Updated
std::cout << "Spam: " << spam[0] << "\n"; // 2.7182
std::cout << "Eggs: " << eggs[0] << "\n"; // 3.1415
```

Don’t take my word for it though! If you build and run this example, you’ll see that the values in each array have been swapped. In order for this to compile, we need to link with the system installation of OpenCL. We also need to pass `-std=c++11`

to the compiler to enable variadic templating in Chlorine. You should end up with something like this:

```
$ clang++ -std=c++11 swap.cpp -lOpenCL # Compile
$ ./a.out # and Run
```

Timing is built in, so you can also effortlessly recover profiling data! Chlorine Workers return an OpenCL event associated with the kernel function call. This allows you to recover profiling data, such as how much time was spent executing the kernel function.

```
// Store the Returned OpenCL Event Object
auto event = worker.call("swap", spam, eggs);
```

To make things easier, there is a helper function `ch::elapsed()`

which accepts an OpenCL event and returns the elapsed time spent on your kernel function. This helper preserves the nanosecond resolution offered by the OpenCL API and is merely a convenience wrapper.

```
// Print Some Profiling Data
std::cout << "Elapsed Time: " << ch::elapsed(event) << "ns\n";
```

Kernel files are written in a variant of the C programming language. While I won’t go into detail about it here, I hope this serves as a valuable demonstration in how Chlorine may be used to easily port code to run in parallel. Up next: Visualizing the Mandelbrot Set.

]]>So what exactly is ray tracing? Consider a lamp hanging from the ceiling. Light is constantly being emitted from the lamp in the form of light rays, which bounce around the room until they hit your eye. Ray tracing follows a similar concept by simulating the path of light through a scene, except in reverse. There is no point in doing the math for light rays you cannot see!

Algorithmically, ray tracing is very elegant. For each pixel, shoot a light ray from the camera through each pixel on screen. If the ray collides with geometry in the scene, create new rays that perform the same process for both reflection, as in a mirror, and refraction, as in through water. Repeat to your satisfaction.

Having worked extensively with OpenCL in the past, this seemed like a good candidate to port to a parallel runtime on a GPU. Inspired by the smallpt line-by-line explanation, I decided to write a parallel ray tracer with extensive annotations, using only the GLSL fragment shader drawing on a rectangle (i.e. “2D Quad”). I start with a simple ray definition, consisting of an origin point and a direction vector. I also define a directional light to illuminate my scene.

```
struct Ray {
vec3 origin;
vec3 direction;
};
struct Light {
vec3 color;
vec3 direction;
};
```

In real life, objects have many different material properties. Some objects respond very differently to light than others. For instance, a sheet of paper and a polished mirror. The former exhibits a strong *diffuse* response; incoming light is reflected at many angles. The latter is an example of a *specular* response, where incoming light is reflected in a single direction. To model this, I create a basic material definition. Objects in my scene share a single (RGB) color with diffuse and specular weights.

```
struct Material {
vec3 color;
float diffuse;
float specular;
};
```

To render the scene, I need to know where a ray intersects with an object. Since rays have infinite length from an origin, I can model the point of intersection by storing the distance along the ray. I also need to store the surface normal so I know which way to bounce! Once I create a ray, it loses the concept of scene geometry, so one more thing I do is forward the surface material properties.

```
struct Intersect {
float len;
vec3 normal;
Material material;
};
```

The last data structures I create are for objects used to fill my scene. The most basic object I can model is a sphere, which is defined as a radius at some center position, with some material properties. To draw the floor, I also define a simple horizontal plane centered at the origin, with a normal vector pointing upwards.

```
struct Sphere {
float radius;
vec3 position;
Material material;
};
struct Plane {
vec3 normal;
Material material;
};
```

At this point, I define some global variables. A more advanced program might pass these values in as uniforms, but for now, this is easier to tinker with. Due to floating point precision errors, when a ray intersects geometry at a surface, the point of intersection could possibly be just below the surface. The subsequent reflection ray would then bounce off the *inside* wall of the surface. This is known as self-intersection. When creating new rays, I initialize them at a slightly offset origin to help mitigate this problem.

```
const float epsilon = 1e-3;
```

The classical ray tracing algorithm is recursive. However, GLSL does not support recursion, so I instead use an iterative approach to control the number of light bounces.

```
const int iterations = 16;
```

Next, I define an exposure time and gamma value. At this point, I also create a basic directional light and define the ambient light color; the color here is mostly a matter of taste. Basically … lighting controls.

```
const float exposure = 1e-2;
const float gamma = 2.2;
const float intensity = 100.0;
const vec3 ambient = vec3(0.6, 0.8, 1.0) * intensity / gamma;
// For a Static Light
Light light = Light(vec3(1.0) * intensity, normalize(vec3(-1.0, 0.75, 1.0)));
// For a Rotating Light
Light light = Light(vec3(1.0) * intensity, normalize(
vec3(-1.0 + 4.0 * cos(iGlobalTime), 4.75,
1.0 + 4.0 * sin(iGlobalTime))));
```

I strongly dislike this line. I needed to know when a ray hits or misses a surface. If it hits geometry, I returned the point at the surface. Otherwise, the ray misses all geometry and instead hits the sky box. In a language that supports dynamic return values, I could `return false`

, but that is not an option in GLSL. In the interests of making progress, I created an intersect of distance zero to represent a miss and moved on.

```
const Intersect miss = Intersect(0.0, vec3(0.0), Material(vec3(0.0), 0.0, 0.0));
```

As indicated earlier, I implement ray tracing for spheres. I need to compute the point at which a ray intersects with a sphere. Line-Sphere intersection is relatively straightforward. For reflection purposes, a ray either hits or misses, so I need to check for no solutions, or two solutions. In the latter case, I need to determine which solution is “in front” so I can return an intersection of appropriate distance from the ray origin.

```
Intersect intersect(Ray ray, Sphere sphere) {
// Check for a Negative Square Root
vec3 oc = sphere.position - ray.origin;
float l = dot(ray.direction, oc);
float det = pow(l, 2.0) - dot(oc, oc) + pow(sphere.radius, 2.0);
if (det < 0.0) return miss;
// Find the Closer of Two Solutions
float len = l - sqrt(det);
if (len < 0.0) len = l + sqrt(det);
if (len < 0.0) return miss;
return Intersect(len, (ray.origin + len*ray.direction - sphere.position) / sphere.radius, sphere.material);
}
```

Since I created a floor plane, I likewise have to handle reflections for planes by implementing Line-Plane intersection. I only care about the intersect for the purposes of reflection, so I only check if the quotient is non-zero.

```
Intersect intersect(Ray ray, Plane plane) {
float len = -dot(ray.origin, plane.normal) / dot(ray.direction, plane.normal);
return (len < 0.0) ? miss : Intersect(len, plane.normal, plane.material);
}
```

In a *real* ray tracing renderer, geometry would be passed in from the host as a mesh containing vertices, normals, and texture coordinates, but for the sake of simplicity, I hand-coded the scene-graph. In this function, I take an input ray and iterate through all geometry to determine intersections.

```
Intersect trace(Ray ray) {
const int num_spheres = 3;
Sphere spheres[num_spheres];
...
```

I initially started with the smallpt scene definition, but soon found performance was abysmal on very large spheres. I kept the general format, modified to fit my data structures.

```
...
spheres[0] = Sphere(2.0, vec3(-4.0, 3.0 + sin(iGlobalTime), 0), Material(vec3(1.0, 0.0, 0.2), 1.0, 0.001));
spheres[1] = Sphere(3.0, vec3( 4.0 + cos(iGlobalTime), 3.0, 0), Material(vec3(0.0, 0.2, 1.0), 1.0, 0.0));
spheres[2] = Sphere(1.0, vec3( 0.5, 1.0, 6.0), Material(vec3(1.0, 1.0, 1.0), 0.5, 0.25));
...
```

Since my ray tracing approach involves drawing to a 2D quad, I can no longer use the OpenGL Depth and Stencil buffers to control the draw order. Drawing is therefore sensitive to z-indexing, so I first intersect with the plane, then loop through all spheres back-to-front.

```
...
Intersect intersection = miss;
Intersect plane = intersect(ray, Plane(vec3(0, 1, 0), Material(vec3(1.0, 1.0, 1.0), 1.0, 0.0)));
if (plane.material.diffuse > 0.0 || plane.material.specular > 0.0) { intersection = plane; }
for (int i = 0; i < num_spheres; i++) {
Intersect sphere = intersect(ray, spheres[i]);
if (sphere.material.diffuse > 0.0 || sphere.material.specular > 0.0)
intersection = sphere;
}
return intersection;
```

This is the critical part of writing a ray tracer. I start with some empty scratch vectors for color data and the Fresnel factor. I trace the scene with using an input ray, and continue to fire new rays until the iteration depth is reached, at which point I return the total sum of the color values from computed at each bounce.

```
vec3 radiance(Ray ray) {
vec3 color, fresnel;
vec3 mask = vec3(1.0);
for (int i = 0; i <= iterations; ++i) {
Intersect hit = trace(ray);
...
```

This goes back to the dummy “miss” intersect. Basically, if the scene trace returns an intersection with either a diffuse or specular coefficient, then it has encountered a surface of a sphere or plane. Otherwise, the current ray has reached the ambient-colored sky box.

```
if (hit.material.diffuse > 0.0 || hit.material.specular > 0.0) {
```

Here I use the Schlick Approximation to determine the Fresnel specular contribution factor, a measure of how much incoming light is reflected or refracted. I compute the Fresnel term and use a mask to track the fraction of reflected light in the current ray with respect to the original.

```
vec3 r0 = hit.material.color.rgb * hit.material.specular;
float hv = clamp(dot(hit.normal, -ray.direction), 0.0, 1.0);
fresnel = r0 + (1.0 - r0) * pow(1.0 - hv, 5.0);
mask *= fresnel;
```

I handle shadows and diffuse colors next. I condensed this part into one conditional evaluation for brevity. Remember `epsilon`

? I use it to trace a ray slightly offset from the point of intersection to the light source. If the shadow ray does not hit an object, it will be a “miss” as it hits the skybox. This means there are no objects between the point and the light, at which point I can add the diffuse color to the fragment color since the object is not in shadow.

```
if (trace(Ray(ray.origin + hit.len * ray.direction + epsilon * light.direction, light.direction)) == miss) {
color += clamp(dot(hit.normal, light.direction), 0.0, 1.0) * light.color
* hit.material.color.rgb * hit.material.diffuse
* (1.0 - fresnel) * mask / fresnel;
}
```

After computing diffuse colors, I then generate a new reflection ray and overwrite the original ray that was passed in as an argument to the radiance(…) function. Then I repeat until I reach the iteration depth.

```
vec3 reflection = reflect(ray.direction, hit.normal);
ray = Ray(ray.origin + hit.len * ray.direction + epsilon * reflection, reflection);
```

This is the other half of the tracing branch. If the trace failed to return an intersection with an attached material, then it is safe to assume that the ray points at the sky, or out of bounds of the scene. At this point I realized that real objects have a small sheen to them, so I hard-coded a small spotlight pointing in the same direction as the main light for pseudo-realism.

```
else {
vec3 spotlight = vec3(1e6) * pow(abs(dot(ray.direction, light.direction)), 250.0);
color += mask * (ambient + spotlight); break;
}
```

The main function primarily deals with organizing data from OpenGL into a format that the ray tracer can use. For ray tracing, I need to fire a ray for each pixel, or more precisely, a ray for every fragment. However, pixels to fragment coordinates do not map one a one-to-one basis, so I need to divide the fragment coordinates by the viewport resolution. I then offset that by a fixed value to re-center the coordinate system.

```
void mainImage(out vec4 fragColor, in vec2 fragCoord) {
vec2 uv = fragCoord.xy / iResolution.xy - vec2(0.5);
uv.x *= iResolution.x / iResolution.y;
...
```

For each fragment, create a ray at a fixed point of origin directed at the coordinates of each fragment. The last thing before writing the color to the fragment is to post-process the pixel values using tone-mapping. In this case, I adjust for exposure and perform linear gamma correction.

```
...
Ray ray = Ray(vec3(0.0, 2.5, 12.0), normalize(vec3(uv.x, uv.y, -1.0)));
fragColor = vec4(pow(radiance(ray) * exposure, vec3(1.0 / gamma)), 1.0);
```

If all goes well, you should see an animated scene below, assuming your computer isn’t a potato! Alternately, you can check out the complete source code on Shadertoy.

So, to recap, this was my first foray into ray tracing. Originally, I wanted to write this using the OpenGL Compute Shader. That was harder to setup than I originally anticipated, and I spent a fair bit of time mucking around with OpenGL and cmake before deciding to just sit down and start programming.

All things considered, this is a pretty limited ray tracer. Some low hanging fruit might be to add anti-aliasing and soft shadows. The former was not an issue until I ported this from a HiDPI display onto the WebGL canvas. The latter involves finding a quality random number generator.

]]>