OpenGL Gems is the authorative guide on OpenGL programming published by Learning Curve / Education for software developers! This book in 3D C++ programming series clearly, and in easy to follow language, explains fundamental principles used behind programming 3D games using the OpenGL crossplatform framework.
Preorder Now in PDF format for Kindle and get a discount.
OpenGL Gems is coming out on April 15th, 2017.
The paperback version will be available via this Amazon page, please bookmark it!
OpenGL Beginner's Introduction Tutorial (Fundamentals,
Perspective, Projection, Camera, 3D Graphics Pipeline)
I really hope you do not just copy and paste source code from tutorials and actually study 3D graphics fundamentals and commit them to memory. I can't stress enough how much it matters to understand basic principles first. And the purpose of this tutorial is to go over such absolute "mustknow" ideas common to graphics programming.
 Learning 3D Graphics From Scratch
 First Fundamental Principles
 Get PDF OpenGL Gems To Speed Up Learning Process
 3D Basics Everyone Should Know Before Touching OpenGL
 Perspective and Orthographic Projection
 Viewing Volume (Frustum)
 Viewing Distance
 Clipping Volume
 FOV (Field of View)
 3D Graphics Pipeline
 OpenGL Variable and Function Naming Conventions (obsolete)
Not many people would enjoy listings upon listings of source code presented without any context or general explanation of what the heck is going on. Especially, when it comes to a vast subject such as 3D graphics programming.
However, if you take that road, eventually you will get stuck in a pool of mud. I am not worried about that you will. But that when you do, you will get discouraged to continue learning OpenGL, which is one of the most powerful and fantastic frameworks for creating interactive 3D graphics and computer games.
Learning 3D Graphics From Scratch
You can't copy and paste OpenGL source code built on layers of complexity, and tell me you can "make it your own," without at least understanding the OpenGL basics and computer graphics creation as a whole. And in the world of OpenGL, that "least" is a lot.
While the rest of OpenGL tutorials on this site are full of examples and source code, in this introduction for absolute newcomers to OpenGL, we'll investigate some of the fundamental principles behind creating 3D graphics in general, in any language, on any platform or Operating System.
This is for those who love 3D graphics, and have lots of inspiration for creating computer games, but can't find those tutorials that, in common language, explain how the whole puzzle fits together.
However, I do understand that this is Internet. You can come to OpenGL from any background possibly imaginable.
If you already have a good grasp of what perspective projection is and what backface culling is, you would probably want to skip this part and go directly to download OpenGL base code page. Better yet, you can pull it from my GitHub and fork it out. You can follow my GitHub account, (or you can follow me on Twitter) if you want to stay in touch.
I do, however, still suggest reading this intro, regardless of where you're coming from. Because not only does it contain 3D fundamentals but also addresses some information on how this series of OpenGL tutorials are structured and why it is here.
More importantly some really basic OpenGLrelated information (like general naming convention that OpenGL uses for functions and variables) is explained and if you don't understand it, things will get tough for you. Update: 04032017, as of right now OpenGL no longer supports Immediate Mode. But for legacy and educational purposes, I left that section as part of this article.
First Fundamental Principles: 3D Graphics Rendering, Perspective, Polygons (Triangles), Rasterizing, Cartesian Coordinate System, Choosing Between PerspectiveProjection and OrthographicProjection Mode, Looking Down ZAxis (Default or "Reset" Camera Position,) Viewing Volume, Camera Frustum, Viewing Distance, Viewing Volume, Clipping Planes, Etc.
It isn't one (or any) of my goals to appear in the top 10 results in Google, but to provide a reliable source for OpenGL tutorials. The reason is simply that I thought there was lack of detailed explanationbased tutorials out there, the kind you would see in a book.
Only this time you don't have to buy anything (except of course, maybe the OpenGL book "OpenGL Gems" which is founded in years of OpenGL research and interest in game development (gamedev).
I wrote OpenGL Gems in addition to all of the free tutorials on this site! It being my own book, I am a little biased, but I actually do recommend getting it only in one case. That is, if you are passionate about OpenGL and want to actually get to know how to gain control over making things happen in 3D.
I really owe it to years of research out of personal interest. I love making video games. I wrote the book, simply because I liked the idea of sharing it with others. You don't have to get it to continue having access to the rest of tutorials on this site, but if you do you have my gratitude.
Writing tutorials (or any other documentation or books) is hard work and it is timeconsuming. Not everyone has the opportunity and patience to write a few solid tutorials. But with time and patience everything is possible.
I think there's a great deal of demand for complete 3Dtutorials that walk you through the process from start to finish. You have a video card and it's such a powerful device. But the knowledge that takes to get anything done with it can be comprehensive.
With the recent advances in computer graphics hardware that can now draw millions of polygons at 60 FPS  what's your weapon of mass rendering? Nvidia GTX, ATI Radeon R9, 4K graphics anyone?
3D Basics Everyone Should Know Before Touching OpenGL
In this part I will cover 3D graphics in general and most of the following topics don't have to be constrained to OpenGL alone. So what is exactly 3D and how can it be represented to the viewer on the computer screen?
To describe the idea behind rendering 3D objects on the screen it's best for me to use a 3D object. Lets examine the following image of a wireframed 3D cube.
For your brain (that is if you believe the brain is the main processing mechanism behind perception) 3D objects are so common, that by looking at this picture you will instantly recognize a "3D" shape
Other than that, though it's nothing more than a collection of 12 (twelve) 2D lines connected to each other at a certain angle between them. Yet, it's hard to think of this image as being "flat" or twodimensional. The results of 3D data rendered on the screen is always a flat picture.
What are the main requirements to render an object, so that you will be able to correctly recognize it as a 3D object? And not just a collection of lines? The idea is to render objects to the screen the way you would see them in real life. And how do we see objects in real life? This is where the idea of perspective view comes from.
In the precomputer age, artists had used the same techniques for painting their masterpieces that today's 3D software is using for creating impressive, animated 3D graphics. The point behind perspective is that all objects farther away from the viewer look smaller than objects closer to the viewer. Ultimately they disappear into the "vanishing point".
Luckily for us, OpenGL supports perspectivebased rendering out of the box. We just need to supply a list of vertices, or in other words, pack vertex data into an array and send it (or upload) to the GPU for residence. And matrix calculations required to do perspective projection have been abstracted by matrix libraries. Still, it's nice to be aware of how these fundamentals work, as basic as they are.
3D Cartesian Coordinate System
Now lets take a look at the OpenGL coordinate system we will be using. This is socalled 3D "Cartesian" coordinate system. As you can see, additionally to the x and yaxis known in 2D graphics we have the zaxis which extends into negative space from the center of the screen (from the camera, or from the "viewer") and into positive space from the center of the screen towards the viewer.
This image visually mimics this principle:
Perspective and Orthographic Projections
As we take little steps towards an increase in the wealth of OpenGL knowledge, I think it's the right time to explain 3D camera projection here. There are two primary types of camera perspective projections. One is perspective based, the other is often used for making 2D games. They are... you guessed it. Perspective Projection and Orthographic Projection.
First, I want to talk about Perspective Projection, because we've already seen an image describing it. Objects that you're going to render will be actually what we might call "projected" to the screen. Another term for this is "rendered" or "rasterized." Although rasterization commonly refers to the process of drawing 3D graphics in software (without using the graphics card.)
What I mean by projection is the actual conversion from the 3D coordinates (usually vertices of objects) to the 2D flat surface of the screen. Since the computer screen has only two dimensions we somehow have to display the 3D objects on the 2D screen. And that's precisely what projection does for us. In OpenGL, something called Projection Matrix is used to perform this operation (There is a matrix tutorial on this site somewhere, but we'll get to it later.)
Mathematically, Perspective Projection works as follows. Let's take a single pixel to demonstrate. Imagine we have a pixel with coordinates of (5, 3, 2) on the X, Y and Zaxis respectivel and we want to project it onto the 2D screen area of certain dimension. We can do it with the following algebra (math) formula. Assume we have a C structure (or class) POINT3D containing the coordinates of the point initialized with the mentioned values for this example.
// Initialize point
POINT3D point = { 5, 3, 2 };
// Find the right position on the screen in 2D coordinates
float x2d = ScreenWidth * point.x / point.z + (ScreenWidth / 2);
float y2d = ScreenHeight * point.y / point.z + (ScreenHeight / 2);
// Project the 3D point to the screen
Pixel(x2d, y2d);
This formula does not count aspect ratio. So it produces a slightly distorted image, on nonsquare screen resolutions. Which is not a big problem, it can be adjusted and distortion is barely visible on screens whose dimensions don't greatly diverse. Still, we must fix this if we care about accurate image.
If width is greater than height, which is the case most of the time, you need to additionally multiply resulting X coordinate by HEIGHT/WIDTH fraction, and leave Y coordinate alone. If the height is greater than width, do the reverse of that and multiply resulting Y coordinate by WIDTH/HEIGHT ratio.
Aspect ratio adjustments:
// Initialize point
float aspect = ScreenHeight/ScreenWidth;
// Find the right position on the screen in 2D coordinates
float x2d = aspect * ScreenWidth * point.x / point.z + (ScreenWidth / 2);
float y2d = ScreenHeight * ScreenHeight * point.y / point.z + (ScreenHeight / 2);
Let's take the rest of the formula apart. As you already know, usually in 2D all coordinates are based on the 4th quadrant in 2D Cartesian Coordinate system. That means that (0, 0) is at the upper left corner of the screen. In 3D graphics, we want our view, or the camera to be exact, (camera is explained a little further into this tutorial) to be located as in the following image, so that we're always looking straight down the negative space of the zaxis.
A Brief Introduction To Matrices
If you got this far, it's a great idea to at least briefly mention matrices, and how they work. No, I will not go into gritty 3D matrix details, and how to use them. It's just, in 3D graphics we never really will be dealing with the formula I demonstrated just above in its raw form. It is represented as a matrix. Don't worry if you are not yet familiar with matrices, that's okay. You will gradually gain a deeper understanding of them.
What follows below is an excerpt from my WebGL Gems book, which is available on sister website (WebGL is like OpenGL for the webbrowser, be sure to check it out!) I just took part of the chapter from WebGL Gems to match this tutorial's format.
Now as I said, our matrix representation of this formula will be a little different. We can't just plug in these values and have a projection matrix. This is due to the fact that the actual camera projection in determined by a slightly more complex set of trigonometric functions.
In fact, when we created the starfield (check JavaScript source code by rightclicking and going to "View Source"), we cheated a lot by skipping worrying about near and far clipping planes, field of view angle (or the angle of view) and camera scaling factor. In a real OpenGL program, we cannot do that because near and clipping planes are required arguments for perspectiveprojection 3D camera creation function.
Also recall that we had to divide the x coordinate by screen ratio (height / width) to make the objects in the view proportional to the screen without an odd skewing effect. This effect was created by the fact that our screen is not square. Had it been, we could avoid this calculation altogether.
Tangent Metamorphosis Into Camera's View Matrix
We will now, for the first time, factor in the FOV (field of view) into our camera projection calculation. We are not changing anything. We're looking at the same thing from a different angle. The tangent angle.
In order to do this we need to introduce the tangent formula into equation. This is just a trigonometrical way of rewriting our starfield projection calculation.
Tangents work with triangles that have a 90 degree angle. But our camera viewing angle can be anything we want. How do we deal with this problem?
By simply dividing any angle by 2 we can ensure that it will be part of a 90degree triangle. Still don't believe me? Take a look at this diagram I created:
By the way, the blue areas are invisible to the camera, even if there are objects there. These objects lie outside of the clipping plane. They are discarded by the 3D volume clipping algorithm.
Defining our camera perspective using the tangent of the FOV angle is important because instead of taking a wild guess, we can now choose the viewing angle of our camera lens. In normal circumstance it can range from 45 to 90 degrees.
The approximate field of view of an individual human eye (measured from the fixation point, i.e., the point at which one's gaze is directed) varies by facial anatomy, but is typically 30 degrees up (limited by the eye brow bone) and about 45 degrees in both left and right directions combined.When choosing this angle for your own games or applications, just think intuitively. How wide should be the viewing angle in degrees? Anything above 90 degrees will start to look unnatural.
Instead of relying on the screen width and height we will now rely on the camera's viewing angle or FOV. It's just an intuitive way to base our camera projection on.
Having said this, introducing the camera lens matrix calculation or camera's View matrix:
1 / tan( fov/ 2)
Literally coming from this point of view we can construct the following 4x4 projection matrix:
(1/tan(fov/2))/a 0 0 0
0 1/tan(fov/2) 0 0
0 0 zp/zm (2*zfar*znear)/zm
0 0 1 0
Where the following is true:
fov = camera's viewing angle or "field of view"znear = distance from camera lens to near clipping plane
zfar = distance from camera lens to far clipping plane
zp = zfar + znear
zm = zfar  znear
Again, I don't want you to worry much, if you don't understand some of these calculations. Or what the dinosauric 4x4 matrix represents. Matrices are just gridbased sets of data, they are used to multiply a vertex data set (X,Y,Z) by each value in a column, for each axis respectively. And fov (field of view), znear, zfar, zp and zm are the minimum parameters required to construct our 3D camera view.
And of course... there is "one more thing" that went into this calculation that we haven't talked about yet. The aspect ratio. Perspective projection construction functions usually require the following parameters to be supplied in order to create standard camera view:
Aspect Ratio:
The width / height of the screen represented by a single number.
The field of view angle. Common examples: 45, 90. Near Clipping Plane:
Distance from camera lens to near clipping plane.
Far Clipping Plane:
Distance from camera lens to far clipping plane.
This way we are no longer tied to the limitation of our simplified math from earlier in the tutorial. This is a solid representation of our 3D camera that gives us various levels of control over its field of view and aspect ratio.
And finally, here is what our camera model looks like from above looking down the Y axis.
Diagram courtesy of wiki.lwjgl.org (I modified it a little)
I hope that preceding explanations have cleared that fog and provided enough material to start experimenting with being a cameraman in games or applications written by yourself.
As we continue moving forward, in further examples presented throughout tutorials on this site, we will take a look at how this knowledge can be used in practice to do some interesting things. I hope these basic principles have sunk in deep enough and provided enough mental ingredients to cook and boil on slow fire for a while.
In fact, the principles and math behind them are so simple. It's just a matter of focusing on the right thing. If our patience is tested by them, then how much more impatient would we be when we study advanced subjects that require far more mental endurance than this?
Viewing Volume and Viewing Distance
As you recall objects that appear farther from the viewer are smaller, and this is the exact relationship between the 2D points and the perspective, which is achieved by division of the both horizontal and vertical coordinates by the amount of how far away the object is. However there is a problem. By merely dividing the x and y coordinates by depth (the z coordinate) we will only get the ratio between the depth and vertical/horizontal position of the pixel. And what we need is how they are actually related to the Viewing Distance and Viewing Volume. These two terms are explained below.
The Viewing Volume is the space between the near clipping plane (or the viewing plane) and the far clipping plane as seen on the second picture below. So, back to our equation for a second, we simply multiply x and y by ViewingDistance to get the right relationship between the viewing volume and the X and Y coordinates. Simple as that.
Viewing Distance is closely related to the viewing volume. The longer the viewing distance, the narrower is the line of sight and therefore the smaller the viewing volume. Well, the good news is that we don't have to worry about all of this in OpenGL since everything is done behind the scenes.
However, you still need to understand these terms to understand why images appear the way they appear on the screen, and I just wanted to explain the basics of perspective projection. The above formula could be used in a software 3D rendered but we're not interested in that at this moment.
Final Camera Projection
In summary, here's how a few objects (as opposed to a pixel in previous example) would be projected onto the screen solely from 3dimensional vertex data representation. At the center and a bit to the right of the camera, depicted in the next diagram, we're seeing two spheres in two colors: orange and blue. One slightly behind the other.
Both spheres exist in a "world coordinate system" and simultaneously being projected onto a flat 2D screen. I tried to make the projected version of the spheres as it appears on the screen as close as possible to what it would be like. It's a close approximation.
Just keep in mind that the whole object is projected on the flat screen pixel by pixel (and polygon by polygon, of course). Spheres are made up of hundreds of triangles. Regardless of what the object we're rendering is (cube, sphere, cone) we're still just rendering triangles, or discarding ones that are facing away from the camera using a backface culling algorithm.
I talked about viewing volume and how it is related to the perspective projection equation. But what is viewing volume? It is also known as the "clipping volume" or more commonly "frustum."
In the diagram above, it is represented as the cone with the top sliced off. It has pyramidlike structure. This is the visible volume of currently rendered scene. Anything outside of it will not be visible on the screen.
There are two planes, the viewing plane and the far clipping plane. The viewing plane is actually the screen and the far plan indicates how far you can "see", whatever is behind the far clipping plane will not be visible
The viewing volume is the space between those two planes. The viewing volume is sometimes called clipping volume because you usually want to clip your polygons against it.
Orthographic Projection
There is another type of camera projection called "orthographic". The math is a little less complicated behind it. We're completely ignoring the scaling to Z. Objects that appear farther from the camera, will not appear smaller on the screen, but of their actual size no matter how far or close they are.
This type of projection is never used in games that require first person view, since it ignores the Zaxis coordinate completely. Orthographic can still be used with success for making isometric games, games with unusual or skewed camera perspective, and it is absolutely the best choice for making 2D games.
In other words, if you draw a bunch of trees close and far away from the view, they will all appear the same size which is a good set up for platformers that use 2D sprites, always facing the camera.
3D Camera
At this point I should explain what camera is. The camera is always located at the origin of the virtual "view". However, it is not necessarily located at the origin of the Coordinate System (z=0,y=0,z=0) since you can move the camera around and transform it to anywhere in the world space.
The camera and the view are representative of the same thing. Camera is only mentioned to represent a "virtual" viewing point, but there is actually no physical camera anywhere around. There is usually some space between the origin of the camera and the viewing plane (the screen on which objects are rendered to, or the near clipping plane). As you saw in the previous diagrams. That space is the viewing distance.
If you look straight ahead into your 3D camera, you are considered to be looking down the Cartesian coordinate system's zaxis into the "negative Z space". Camera rotation is possible around all 3 axis, as long as we are rotating around only one axis at a time. Complex rotations are achieved by combining rotations around multiple axis. Camera rotation is responsible for moving the view. That's what happens when you move your virtual head around with the mouse or arrow keys. Lets examine the camera a little closer.
Camera, as any other object in space has 2 coordinate systems. The two are the Local Coordinate System and the World Coordinate System. The local coordinates are the camera's rotation degrees on all of it's LOCAL xyzaxis and actual displacement from the local coordinate system (but usually all objects are designed around their origin, or center at 0,0,0).
The world coordinates specify the camera's position in the world. For example, when you walk around in a firstperson view game, you are actually moving the camera's coordinates in World coordinate system, and when you look around you change the camera's local axis rotation coordinates. Combined, the simulate illusion of walking and looking around simultaneously.
It is possible to use the local camera coordinates for moving as well by translating them to the new location. But only before rotation transform is performed. Rotation is done in local coordinates around (0,0,0) to keep math simple, and if you move the camera before rotating it, to say a position like [0, 5, 0], it will not rotate correctly as its center will be displaced and taken into account during rotation. Which usually results in a "wobbly" rotation effect.
Remember this rule: always rotate around the local center (0,0,0) first, and only then translate the model to some world coordinate. This is the proper order of doing basic 3D transforms. However, there are cases when you need to transform first, and then rotate, though not as common, and is usually reserved for more complex rotations. If this sounds confusing, don't worry. It will all settle down the more you study and actually code a few basic rotations and transforms in OpenGL, if you haven't already. Here's how the camera's coordinates are transformed.
If you understand this so far, that's good. Now, let's move on to object rotation basics. This is exactly the same as demonstrated on the camera rotation part of the above image. The only difference is that we're not viewing the world FROM that object, but are in fact OBSERVING that object from the current camera position. This is the way an object is rotated around all of the 3 possible axis. When we get down to actually doing it in the following tutorials, I will make it more clear, so don't worry if you don't get something at this moment.
Just the same way it is with the camera, the objects also have two coordinate systems and as you might have guessed already, the objects are positioned according to the LOCAL and WORLD coordinate systems. The local coordinates are usually used for rotating the object and the world coordinates are used for positioning the object in the world or, say, in a 3D level.
Backface Culling
As you add objects and static polygons (e.g. walls, terrain, etc.) to your 3D world you want to clip all of the polygons that are not located in the camera's viewing volume. You also want to clip off parts of the polygons that are on the edge of the view volume against the bounding box of the screen. The former is provided for us by OpenGL.
Another issue associated with drawing polygons is that you don't want to draw the back faces (or sides) of the polygons when they are facing the camera.
Imagine a textured polygon which is rotated by 180 degrees so its "back" is facing us. Let's also assume that that polygon is a part of a bigger structure, a wall for example. Usually you will never want to see what's "behind" the wall. Have you ever wanted to see what's behind your room's wallpaper? I surely hope not.
The point is, if you rotate a textured polygon, its coordinates are reversed judged against the camera view and you never want to see that anyway and that space is usually covered with another side of the wall, so why bother drawing it? That's right, there is no reason to and a technique called Backface Culling comes to our help.
Backface culling works this way: it calculates the normal of the polygon (a normal is a perpendicular pointing straight out of the polygon at a 90 degree angle and is very common in 3D graphics calculations) and if it is pointing roughly in the same direction as the camera (at least over or equal 90 degrees away from the camera) the surface of that polygon is not rendered as illustrated in this image.
OpenGL Variable and Function Naming Conventions
In conclusion I want to say a few words on this topic. OpenGL was made for use with various environments, not just Windows. You can always find more information in the numerous OpenGL books that are reasonably affordable, for a technical book, considering the amount of knowledge you would have gained by the time you finished a book. In this section I explain naming conventions for both OpenGL functions and variables.
Although you don't have to use OpenGLdefined types I still feel obligated to describe them here so that anyone who wants their software to be platformindependent understand what this all means. Well, lets see. OpenGL has a number of predefined types. If you never plan being platformindependent it might be the best way to use local C types such as int, float and double. However if that's not the case, OpenGL has definitions that will work on the current system whatever the system is. All you have to do is add GL in front of the standard C types. For example, if you want to use a floating number type use GLfloat instead of C's float and if you want to use an int, use GLint. That works for the rest of the normal C types as well.
If you want to use an unsigned value, just add a "u" between GL and the type like so: GLuint; is an unsigned integer. There is also a GLboolean which is identical to bool in C. GLbitfield is used to define binary fields. A little less obvious type in OpenGL is clamp; its variations are clampf and clampi for floating and integer variables respectively. It is short for ColorR AMPlitude and used for color compositions. There are no types for pointers. Pointers are defined the usual way. For instance this is an array of pointers to int: GLint *i[16];
Each OpenGL function has a neat naming convention and its format is:
<library><function name><number of arguments><type of arguments>
To demonstrate this on a real name function I will use the glVertex3f function.
glVertex3f(0.0f, 0.0f, 0.0f);
  
  
  + f means all parameters are floats
  
  + 3 is the number of parameters
 
 + Vertex is the name of the function that renders a 3D point (or a vertex)

+ gl specifies the opengl library
The last two parameters are mostly encountered in the functions that are responsible for drawing primitives. Many other functions are usually used in this form:
<library><function name>
Final words
Well, what can I say, this has been a long read but this isn't even close to the full picture. I tried however to cover most general topics that came to my mind. This should definitely make it easier for beginners to read the rest of tutorials.
Hope the illustrations helped you in some way to understand the described topics better. Now, sit tight and wait for the next tutorials which will actually put what's been said in here to action! Feedback and suggestions are welcome.
OpenGL Gems is the authorative guide on OpenGL programming published by Learning Curve / Education for software developers! This book in 3D C++ programming series clearly, and in easy to follow language, explains fundamental principles used behind programming 3D games using the OpenGL crossplatform framework.
Preorder Now in PDF format for Kindle and get a discount.
OpenGL Gems is coming out on April 15th, 2017.
The paperback version will be available via this Amazon page, please bookmark it!