As a professional computer programmer, I’ve naturally always been fascinated by cutting edge technologies that generate 3D worlds from geometry. In reality, however, the vast, vast majority of professionals in this business don’t ever get the opportunity to work in 3D. Over 50% of computer programmers are stuck building databases that amount to nothing more than designing forms for companies to fill out, while the minority of programmers actually get to do “real work” and most of that work doesn’t dive into 3D imaging or graphics. Furthermore, 3D-programming is demanding and changing constantly, historically making it only feasible to conquer if you are one of those few people who were fortunate enough to program full-time-3D on someone else’s dime. True reality is that you would probably need a whole team of people, including mathematicians, artists, and even building architects, to stay at the head of the pack.
Fortunately, there are some newer, great tools out there that take a lot of the mystery away from it all and might give you some in-roads to getting things done without having to get into too much nitty-gritty. I highly recommend Unity3D. But, if, for whatever reason, you find yourself having to build a 3D-Engine from Scratch, here’s a list of things that every 3D-Engine programmer has to fully and truly conquer before your first 3D-Engine can be a reality. I use the word “conquer” because… yes… you really have to “know” this stuff… not just bits and pieces… but really know and understand it. Therefore after reading this blog, you will in-no-way be made an expert in the subjects contained herein… but I intend for this blog to be more or less a road map of things to learn and things to remember about the 3D pipeline.
Starting from nothing
If you have had ZERO exposure to how 3D graphics are generated in computers, there will be a lot of information here that is of total shock and surprise to you. The way 3D-Pipeline actually works compared to how a human might intuitively think it works is, for a lack of a better description, “completely backwards and kinda fucked up”. There are even parts of the 3D pipeline that are grossly half-assed and incomplete, but we let them slide because they only need to be cheap parlor tricks and not necessarily accurate.
I have been programming computers since I was a child, before the internet was really invented. I was only able to learn what I could find in books and had to dream about how various problems might be solved back then. There were lots questions I ultimately had when I finally got around to figuring this stuff out so I’ll start with the super-basic questions you might have.
I’ll do my best to answer these questions in a way that a total noob might want them answered, but this blog is intended for people who already know how to program computers, but never really got the opportunity to dig into 3D at the low-level. Even for veteran programmers, this stuff can be quite intimidating.
1) What is a vector?
A vector is simply a collection of coordinates. X,Y,and Z are common. This is often called a “Vector3” type. Computers are good at working with vectors and vector processors (MMX, SSE, 3D-Now) are capable of actually operating on 4-dimensional vectors (usually expressed X,Y,Z,W) in one single operation. Sometimes vectors are used to express colors as well and are given R,G,B semantics, or the 4D equivalent: R,G,B,A. The “A” is usually to express a level of transparency and the aforementioned “W” is usually 1, but in special cases affects how your GPU resolves texture mapping.
Get used to using vectors for just about everything you can. Never juggle around miscellaneous x,y,and z variables (same goes for color values r,g,b) when you can perform operations using vector math across all of them simultaneously.
//Avoid writing code like this SomeUnoptimizedStruct a,b,c; c.x = a.x+b.x; c.y = a.y+b.y; c.z = a.z+b.z; //Instead write this Vector a,b,c; c = a + b;
2) What’s the fastest way to rotate an object?
When I was in highschool, my first trigonometry class was highly focused around triangles, tangents, circles. It was weeks and weeks of studying various applications of the “Pythagorean theorem”. I heard so much of it that I thought that a^2+b^2=c^2 was basically the best and only answer anyone ever had for anything. Bear in mind that 99% of my math class back then was taught on a chalk-board or overhead projector and all problems were computed by hand. So when they started covering this little thing called “linear algebra” and began making us multiply all these “matrices” by hand with pencil and paper I naturally thought, “What the hell is this?” or “Why do I want to multiply and add 32 numbers together to solve an equation that I can solve in 4 steps using traditional algebra?”
None of those people who would give me hours of homework to slave over every night had an answer to those questions. Not a single one of them seemed to understand that linear algebra was the way computers would be solving all kinds of problems in the future… in fact, I spent a lot of time fixing their computers for them… if they were even sophisticated enough to own them. Educators in the 90’s and before were notoriously computer illiterate.
What they didn’t understand was that computers would evolve to become REALLY good at processing vectors, and REALLY good at processing matrices. So whereas multiplying something 32-times on paper was an awful experience for me as a teenager, it was actually a pretty simple operation for a computer to do… especially if the computer had a dedicated vector processing unit attached to it. With a vector processor, these 32-multiplications could actually be boiled down to just a few instructions. If you’ve ever heard of buzz words “MMX”, “SSE”, or “3D-Now”, essentially what they’re talking about are vector processing extensions to the x86 instruction set. Yet, a good GPU chip made by nVidia or ATI is actually capable of vector processing at thousands of times faster speeds than your CPU.
So to finally answer the question, the fastest way to rotate an object (in the computer world) is to use a rotation matrix. Matrix math is the first thing you have to REALLY conquer as a 3D programmer. If you’re worried about having to learn about this, I’ll try and offer some advice to make it less intimidating.
Open up your favorite spreadsheet program, Google Sheets or Excel or whatever and make yourself a few areas where you throw down some 2×2 matrices. I recommend you start with a 2D matrix… just a 2×2 matrix with X and Y coordinates. Give youself another area of the spreadsheet where you feed coordinates of a 2D square or other simple 2D shape through the matrix and observe how the coordinates change as you apply stock 2D rotation matrices (there are examples all over the internet). In no time all the magic will start to make sense. Once you conquer the 2D concept, upgrade to 3D and 4D matrices. The 3D graphics pipeline works generally on 4D vectors and 4×4 matrices. Once you get to the 4D matrices you can plug in anything you find on the internet to observe the behavior of how it all works.
Why Matrices? What are the advantages of this way?
a) A Matrix turns the problem of rotating an object into a very simple multiplication problem. Essentially all you’re doing when you feed a point through a matrix is adding up a few weighted numbers from the each dimension that each influence the resulting point. Essentially the individual x,y,z, and w columns of the matrix are allowed an opportunity to contribute a ratio of the input value to the output based on the numbers that are in the matrix. Just do the Excel thing, it’ll all make sense, trust me. I’ll eventually update this blog with a link to my own spreadsheet.
b) Standard plug-and-play matrices exist not just for rotation around all 3 axes, but for moving (translating) an object through space, and for scaling an object (making things bigger and smaller).
c) A super magical awesome aspect of matrices is that they can be combined into one single “composite” matrix. This means that you can build a matrix that rotates an object around X,Y,and Z axes, scales the object, and then moves the object to another part of the world all with just one matrix. This is a big part of the magic that makes things happen fast in your video games. Once this composite matrix is created it can be re-used for all the thousands of points and triangles that might make up your 3D character or spaceship or whatever that is on your game screen performing the job of all those matrices with just a single matrix.
d) Also super magical: A matrix operation is reversible. If you apply a rotation matrix to an object to into a composite matrix, you can undo that operation by computing its “inverse matrix”. Most 3D apis have a simple function for computing the inverse of a matrix. If you just multiply that inverse with your existing matrix, then the matrix is undone. This is great for turning things from “world space” into “camera space” where typically programmers take the inverse of the camera’s position in space and apply that to the “model” matrix that positions the object in the world. In essence by doing this you’re taking a “world-to-model” matrix, multiplying it with the inverse of a “world-to-camera” matrix, and the result is a “camera-to-model” matrix.
e) Matrix operations don’t involve any division operations, squares, square-roots, sin, cos, or tan operations once constructed. Remember all that BS you put up with in math class about “quadrants”? Not a problem with matrix math. Remember that squaring things also removes a negative sign, that is valuable information lost. Matrix math is really a cheap and easy way of dealing with multi-dimensional ratios and is universally applicable to computers.
3) What kinds of problems CANNOT be solved with matrix math?
Where as matrix math itself is pretty sound, there are lots of problems that it simply cannot solve. In fact, there are many problems that are computed on computers using matrices that technically should NOT be solved with matrices… however we use them anyway because they come up with answers that are close-enough-to-the-truth that a simple estimate is probably good enough.
Matrix operations always produce what are called “co-linear” results. What this means is that if you have two parallel lines in the universe before you feed them through the matrix, they will STILL be parallel after they’ve been through whatever matrix you feed them through. You can literally put random numbers into a matrix and observe how it changes two parallel lines and you’ll find that they will always, always, always remain parallel. (Although technically two identical lines are not considered parallel, and it is possible for lines to collapse onto each other, but go organize a protest around the definition of “parallel” if you think that is confusing).
I find that understanding the limits of Matrices helps understand their usefulness.
What you cannot do with matrix math:
a) Make two parallel lines intersect (unless you make them collapse onto themselves to become overlapping on identical slopes)
b) turn a straight line into a curved line. For example, if you wanted to animate a snake slithering using a sine-wave equation
c) Simulate depth-perspective (this is the only step of the standard 3D-Pipeline that doesn’t involve matrices).
4) How does the computer ensure that things far away are not drawn in front of things that are close without doing a lot of sorting?
Answer: The cheapest, and most brute-force way possible: with “Z-Buffering”. Z-Buffering is a pretty simple and beef-headed way to ensure that things in front of other things always end up on-top of things in the distance. Think of it like its own color channel. Instead of “R,G,B” for the final output, we output R,G,B,Z. Then when we’re deciding whether or not to paint a particular pixel a particular color, we simply look at the Z-values. If the Z indicates that what’s already on the screen is closer to the camera, we skip over that pixel, otherwise, we draw on top of it. You get a bit of control over how Z-Buffering works, it can be turned off when needed, you have independent control over when you want to write to it, test against it, clear it, preserve it, and some basic controls over the rules regarding what values are considered “closer” to the camera (decreasing or increasing Z values).
5) If things are drawn with triangles, how is it that the lighting appears smooth and seamless?
This is accomplished with a thing called the “normal vector”. Typically models are sent to the GPU with more than just X,Y,Z values. You have a lot of freedom in how you want to set it up for your vertex shaders and pixel shaders. You may choose to send multiple points, not just one, multiple colors, multiple texture coordinates, and multiple normal vectors in addition to any kind of data you might want to dream-up to offload to the GPU, but typically you’ll have at least one point, color, normal vector, and texture coordinates. The normal vector is essentially an x,y,z,(w=0) value whose length must equal 1 (if the length is 1, then it is called “normalized”). The normal vector is essentially line that extends from the vertex position that leads in the direction that a light would be at to have the most direct influence on the resulting color. If you point the normals on all 3 points of a triangle to be perpendicular away from the triangle, then your triangle will look flat, however if you have two adjacent triangles and average together their joining edge, you’ll have smooth lighting across both triangles.
There are lots of examples out there for how this works, but just keep in mind that your models don’t just have points, but they have normal vectors as well which are generally exported from 3D-Studio or Blender or whatever 3D-Modelling tool you use.
6) What’s the best way to make things in the distance appear smaller?
Ultimately what happens is that x and y are divided by z in camera space. But before this happens, a matrix is setup to normalize the camera space so that the left edge of the screen is -1, right edge is +1, bottom is -1, top is +1, and the furthest depth renderable in the scene is at -1 while the closest renderable is at +1. This is sometimes also called the “perspective cube” although it isn’t a cube and perspective has not been applied yet. But with the camera space normalized (which can really just happen with a number of scaling and translation matrices multiplied together), then applying the perspective is an easy task.
Modern projection matricies generally have a couple of odd abnormalities. Direct3D and OpenGL want to do this perspective application automatically, for you, and rely on the W coordinate to offer guidance. This is the only time that you’ll likely ever set “W” to anything other than 1 in all of your matrix manipulation.
The standard projection matrices that are published by the DirectX and OpenGL pipelines are accurate, so I’d just use them.
7) How is transparency/semi-transparency (alpha) handled?
There’s a bit more than meets the eye when dealing with objects that are semi transparent in 3D. The principal problem is that Z-Buffering assumes that all objects are opaque.
Typically, therefore, you’ll draw your alpha-enabled objects AFTER you draw your Z-Buffered scene, so when you’re building your engine, be prepared to both flag those objects that need to be drawn last.
Additionally for most kinds of alpha blending, you’ll have to draw them in sorted order. You’ll have to sort them manually. For this I use an AVL-Tree, which is a form of binary tree that is more-or-less considered the holy-grail of random-access sorting.
If you don’t want to sort your objects, you can get away with using “additive” alpha blending for things like explosions or light blooms. Additive blending comes out the same regardless of whether the objects are sorted or not.
There are lots of different alpha modes… too many really. Most people use just a couple of modes, most commonly “standard alpha” in which the new color is given (a) weight while the background is given (1-a) weight. With this you’ll get results typical of your cookie-cutter texture overlay style alpha. Additive blending is used for lighting effects, light maps, explosions, fireworks. In this case the background weight is kept as-is and the new color values are added (background.rgb + (foreground.rgb*foreground.a). The colors will get brighter and brighter as you stack more of them on top of each other turning into a light-bloom of sorts.
8) Where do I start?
I recommend you start by picking your poison. OpenGL? DirectX11?. Avoid DirectX9. DirectX9 is thankfully almost 100% obsolete now. This is great because DirectX9 was terrble, yet was the only version widely supported by most video cards for a long, long time.
If you choose DirectX11, you’ll want to get your hands on the FXC shader compiler and some examples. If you’re squeamish about that, then maybe you’ll want to go with openGL in which the shaders are passed in as text, and the openGL libraries are responsible for compiling them for you. I started in Delphi/Object Pascal with someone else’s framework, the “Pascal Extended Library“, that covers OpenGL, OpenGLES, DirectX11, DirectX9, and a few other variants against multiple platforms and heavily modified it from a 2D to a 3D engine. I’m going to guess most of you might end up starting with some official C++ examples and then diverge from there. I also recommend learning Unity3D. It’ll give you some great examples of how your game engine architecture might end up working.
9) What’s my first task?
The “hello world” task of 3D programming generally involves getting to the point where you can render a triangle. If you’re super adventurous you might start with a square or even a cube, but that might be too adventurous to start. If you can get far enough to draw a triangle on the screen, you’ll be off to a good start, but bear in mind that doing this isn’t as simple as writing a few lines of code.
In order to accomplish this task you need:
a) To create the necessary devices, contexts, and surfaces using your chosen API
b) To be able to clear and flip/present the back buffer.
c) Define the vertex format that you’ll be sending to your vertex shader (for DX11 see D3D11_INPUT_ELEMENT_DESC) Here’s how I set mine up in pascal:
CanvasVertexLayout: array[0..3] of D3D11_INPUT_ELEMENT_DESC = ((SemanticName: 'POSITION'; SemanticIndex: 0; Format: DXGI_FORMAT_R32G32B32A32_FLOAT; InputSlot: 0; AlignedByteOffset: 0; InputSlotClass: D3D11_INPUT_PER_VERTEX_DATA; InstanceDataStepRate: 0), (SemanticName: 'TEXCOORD'; SemanticIndex: 0; Format: DXGI_FORMAT_R32G32_FLOAT; InputSlot: 0; AlignedByteOffset: 8+8; InputSlotClass: D3D11_INPUT_PER_VERTEX_DATA; InstanceDataStepRate: 0), (SemanticName: 'TEXCOORD'; SemanticIndex: 1; Format: DXGI_FORMAT_R32G32_FLOAT; InputSlot: 0; AlignedByteOffset: 16+8; InputSlotClass: D3D11_INPUT_PER_VERTEX_DATA; InstanceDataStepRate: 0), (SemanticName: 'COLOR'; SemanticIndex: 0; Format: DXGI_FORMAT_R8G8B8A8_UNORM; InputSlot: 0; AlignedByteOffset: 24+8; InputSlotClass: D3D11_INPUT_PER_VERTEX_DATA; InstanceDataStepRate: 0) );
d) A vertex shader that accepts the vertex format you choose to send it and defines its own “out” parameters which are passed to the pixel shader
e) A pixel shader that accepts the “out” parameters of the vertex shader. For now you can simply program the pixel shader to set a static output color, like blue or red.