Traditionally geometry instancing has been a rather tame concept: Load a model once into memory, render it multiple times in a single frame with different translations.

The idea of geometry instancing is to speed up the rendering of large amounts of similarly structured triangles. The weather renderer in my game engine uses instancing to display lots of snow flakes simultaneously:

(keep in mind this is running in debug hence the low framerate)

As time has gone on and we’ve slowly discovered the dreaded performance bottleneck of actually submitting triangles to the GPU, simply put submitting geometry multiple times to the GPU over and over again regardless if it’s the same or not forces the GPU into multiple translation sets, possible state changes and probably some more things I’m not smart enough to be talking to you about.

So to get around the problem caused by submitting the same piece of geometry multiple times to the GPU smart programmers have decided, why not just submit it all in one big chunk?

This is what geometry instancing is today, if you have one thing and you want to render it lots of times, bunch it all up into the same vertex buffer and submit them all to render once.

To prepare a model for instancing what I do is,

1) Copy all its vertex normals, positions and UV data from the models vertex buffer into a vector

2) Create a new vertex declaration with the same structure as the models vertex buffer but also with an index value and make this new vertex declaration sixty times larger than the original models vertex declaration.

In other words if the vertex buffer I’m going to be instancing has a struct like follows:

1
2
3
4
5
6
struct MESHVERT
{
float x, y, z;      // Position
float nx, ny, nz;   // Normal
float tu, tv;       // Texcoord
};

The instanced version will be:

1
2
3
4
5
6
7
struct MESHVERTInstanced
{
float x, y, z;      // Position
float nx, ny, nz;   // Normal
float tu, tv;       // Texcoord
float idx;			// index of the vertex!
};

By giving the vertex an index value I can tell apart which instance I’m rendering in the vertex shader, this is important so I can apply different transformations to different instances.

3) Copy the vertex, normals, position and UV data into a larger vertex buffer which is created using the new vertex declaration and set the correct index values for them.

So if I instance a model with 40 vertices defined 60 times the new vertex buffer will have 40*60 vertices and the index value (float idx) will be 0 in the instanced buffer for vertices 0 to 40, it will be 1 for vertices between 40 and 80, it will be 2 for vertices between 80 and 120 and so on and so forth.

Now that we’ve created a nice large vertex buffer which is capable of rendering 60 instances of the same model in one go how do we go about rendering it?

In the vertex shader we define a matrix array, of size 60, and then CPU side we set that matrix array with the translations so we render it 60 times with sixty different translations and every time we do render it the renders come all at once, in other words:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
//In the shader:
float4x4 g_vInstanceTransforms[60];
 
//In the C++:
std::vector vcTransforms;
 
/*
fill up vcTransforms with values, if it's larger than 60 it's a waste
*/
 
m_pEffect->SetMatrixArray("g_vInstanceTransforms", &vcTransforms[i],60);
 
// back in the shader we have,
struct VertexShaderInput
{
float4 Position : POSITION0;
float3 Normal : NORMAL0;
float2 TexCoord : TEXCOORD0;
float idx : TEXCOORD1;//the index value!
};
 
VertexShaderOutput VertexShaderFunction(VertexShaderInput input)
{
float4 worldPosition = mul(input.Position, g_vInstanceTransforms[input.idx]);
}

And there we have it.

Of course we can change the number of triangles we submit to the vertex buffer to render less than 60 instances at a time and we can render the same instance buffer multiple times if we so chose.

In terms of performance I have found that this method of instancing is way fast, from dodgy performance tests with fraps that took about 30 seconds to do without me putting in much thought into it I found that rendering 60 instances of the same mesh was just two times slower than rendering the mesh once. When rendering the mesh sixty times uninstanced it was about four times slower than rendering the instance buffer once, with the same translations.

And to close off, here’s a video of some early gameplay, note every projectile is instanced so there isn’t a huge performance hit if one at all,

The engine I am developing for my games uses deferred rendering so I’m posting about the basics of a deferred renderer which is actually a lot more straight forward than many people think.

A deferred renderer is basically a method for rendering graphics where the effects are not applied to the scene during the stage in which we draw polygons and map textures to them, instead all effects are applied to the resulting image of the scene. This provides many advantages, from a cleaner rendering pipeline to the decoupling of geometry complexity from effects. Many modern game engines use deferred rendering including Unreal Engine 3 and CryEngine 3.0.

To render many different types of effects using a deferred renderer we typically want some crucial information about the scene, most deferred render effects require three things: The colour, the normals and the depth. With these three combined we can implement many 3D effects on the scene using only 2D images and in doing so we avoid having to deal with polygons beyond the initial rendering stages.

At the core of most deferred renderers we have the ‘g-buffer’ which is three separate textures listed below.

The Diffuse Map

The diffuse map is the scene rendered with textures and nothing else, no effects or anything, just polygons and textures. It is typically the base image onto which we apply effects such as motion blur, bloom, lighting amongst many others.

The Depth Map

The depth map is typically a single floating point texture, typically of format D3DFMT_R16F (16 bit floating point precision) or D3DFMT_R32F (32 bit floating point precision) in which each pixel is a value between 0 and 1 where 0 is the point at the near clip of the projection matrix and 1 is the far clip of the projection matrix.

We can extract the x/y/z world coordinates of any given pixel from the depth map, for further information on achieving this visit this page.

The Normal Map

From the perspective of the camera the normal map is a texture where the r/g/b value of each pixel corresponds to the x/y/z unit vector coordinates of the normal of each vertex. Unsurprisingly the colours in a normal map varies based on the normal value of each vertex, we find that normals which have large x values produce redder colours while normals with larger y values produce greener values.

In the above screenshot of the scenes normal map you can see the ground on which the soldier is standing is green while towards her front it’s red. You should be able to easily derive from this that she is standing on the ground and facing in the ‘x’ direction.

Notice the gun and the fanblade in the background have much more detail in their normals, this is because they are using normal textures which compliment the vertex normals by chiseling further ‘detail’ into them. During the lighting stage this has no processing overhead since the deferred renderer doesn’t care, it works rapidly in a pixel by pixel basis while a forward renderer would have to continually be sampling the normal map texture and transforming vertices when calculating lighting.

The normal map can exist in world space or camera space, in the screenshot above it is displayed in world space. In camera space the normal colours will keep changing relative to the cameras view matrix.

When I started working on my deferred renderer I used Catalin Zimma’s XNA code, if you’re interested in an in-depth explanation of deferred renderers as well as code I highly recommend you visit his site.

Depth of field is one of the cooler things we see in games and like bloom if abused is noticed in a bad way but if used properly is not noticed in a good way. It’s one of those things you don’t want the gamer to notice too much of because if they do it means it’s getting in the way though it adds a whole lot to the personality of any given scene when done right.

My implementation of depth of field like many of my other effects is an entirely post process effect and is done in the following steps (with accompanying screen shots from my upcoming game to aid in explaining). Please note that if you have any trouble understanding any of the concepts or ‘buzzwords’ I’m using let me know and I’ll do future posts to follow up on them as I’m trying to gauge the experience of readers to this blog, this is after all my first technical post.

Render the scene to a texture with depth using MRT (multiple render targets)

Here is the scene I am rendering without depth of field:

Depth maps are basically textures where each ‘pixel’ is a single floating point value from 0 to 1 where 1 is the projection matrix far clip and 0 is the projection matrix near clip. We can use the depth map to figure out what position each pixel being rendered is in world space. The depth map for the scene is shown below,

As you can see in the screenshot nearer to the camera the image is darker while further out it’s whiter. We’re using a single floating point value representing depth for the RGB hence the image is whiter in areas further away from the camera.

Copy the texture and blur the copy using Gaussian blur

Since we have rendered the scene to a texture we can make copies of it and modify them, so we simply copy the texture and apply a Gaussian blur to it,

Determine how much of the blur texture vs. the ’sharp’ texture should be used for every pixel on screen

To do this I define focal points as values between 0 and 1 which map directly ontop of the depth map values between 0 and 1.

For example we can say that for each pixel in the depth map where the value is from 0.8 to 1.0 we want to use the corresponding pixel from the blurred texture. Another thing we can do is blend the two blurred and sharp texture together, for example if we want an 80% blurred image we can say (0.8f*colourOfBlurredPixel) + (0.2f*colourOfSharpPixel).

This is all fine and dandy however it would mean very hard transitions between blur and sharp which creates the effect that we are sitting in a glass sphere and looking out at the world.

To get around this we use linear interpolation, that is we map the points 0.8 and 1.0 to a blur of 0.0 to 1.0, at point 0.8 the blur value is 0 and at point 1.0 the blur value is 1.0. Linear interpolation can be achieved using the lerp function in HLSL.

In my game I use a near and far blur with four points. Point 1 determines the nearest point in which the image is fully blurred, Point 2 is the end point for the blur after which the image is sharp until point 3 where it starts becoming blurry until it reaches point 4 where it is at maximum blur.

The following two screenshots demonstrate near and far blur,

I have a lot of respect for the folks at Wolfire, they’re working on what looks to be quite a cool game with some innovative concepts and their regular blog updates detailing the various element of their engine is always inciteful and interesting.

However a post by David titled ‘Why you should use OpenGL and not DirectX‘ bugged me for a few reasons, it read like a fanboy post and didn’t give any actual reason as to why one should use OpenGL instead of DirectX but instead about Microsoft using hyperbole in marketing and that it has created a ‘vicious cycle’ of developer feedback.

David structured his post by putting his reasons and explaining his rationale behind them and as such this post is similarly structured as his blog post. Of course this is simply a rebuttal article, I’m not trying to sell DirectX but rather highlight in what ways Davids arguments effectively fail by either being irrelevant or misleading.

Why does everyone use DirectX?, actually, not everyone uses DirectX

To answer this we have to acknowledge the question has a major flaw in it: Not everyone uses DirectX, in fact you would be hard pressed to find DirectX to be used in a Computer Aided Design (CAD) field. The ARB, which is the group that defines OpenGL specifications has very little game development influence, it is mostly controlled by companies interested in developing CAD software.

Network effects and vicious cycles, or why DirectX is such a better alternative to OpenGL for games development

So the reason DirectX is so much more popular than OpenGL is because DirectX caught early on and as such has created a ‘vicious cycle’ between vendors and developers where DirectX is continually improved while OpenGL lags behind. This a moot point, I would like to know in what ways, if any, DirectX has held game development back. Microsoft has spent more on developing DirectX as the premier graphics solution for game development than any OpenGL developer has ever, of course the biggest OpenGL developers are not in the games industry. Microsoft has lost money in this area and regardless of what any fanboy would like you to believe most of that money did not go into marketing DirectX but R&D.

FUD about OpenGL and Vista So what?

David claimed that Microsoft spent a lot of time spreading fear and doubt as to the future of OpenGL during its marketing for Windows Vista and DirectX 10. Whether Microsoft correctly or incorrectly spread malicious rumours about OpenGL is debatable, whether OpenGL has given game developers the cold shoulder with a botched 3.0 specification is not as the thread comments indicate. Whatever damage Microsoft may have done to OpenGLs reputation through deceptive advertising could not damage OpenGL as much as when it shot itself in the foot with the 3.0 specifications.

Misleading marketing campaigns ^ as above

The other points brought up are about Microsofts shady marketing tactics, however shady or not does not actually affect the API itself and regardless of what Microsoft says DirectX can do as a developer I would rather concentrate on seeing whether DirectX or OpenGL is more suited towards what I want to do. Let’s face it, if we’re going to avoid products which were backed by marketing using hyperbole we’d still be using the abacus.

While David is happy quoting people like John Carmack saying he doesn’t see a need to jump straight into DirectX 10, a valid argument by all accounts given most highend games released these days are multi-platform console titles he doesn’t say what Carmack thinks of DirectX in general, “Microsoft has done a great job with all this stuff. I mean, I honestly think that DX9 with how it’s implemented on the 360 is a clearer and more open API than OpenGL is.”

In other words: Just because we’re not using DirectX does not mean we don’t think it’s a great API, and when we have to use it we don’t mind it too much. More to the point it highlights the deceptions David uses, he attempts to inflate Carmacks hesitation to move into DirectX 10 as a general dissatisfaction with DirectX as a whole.

OpenGL is more powerful than DirectX, are you a software engineer or an OpenGL fanboy?

This is where I start doubting David is coming up with these opinions because he actually believes DirectX is a weaker API or because he simply dislikes Microsoft. First of all, what is power? OpenGL and DirectX are both specifications, in terms of power they’re little more than ideas with a software layer backing them up. The question is, which API has better support in consumer hardware and therefore will run better. Regardless of the power OpenGL may have, buggy drivers and a poorly conceived API will still bring it down. I don’t know about David but I would gladly lose 20% performance for 30% more clarity.

Now here comes the hypocrisy, while David is happy to attack Microsoft for FUD he does the same thing throughout his post but here in particular: he claims the tessellation feature in DirectX 11 has been an OpenGL extension for three years. I think he will be very, very shocked to find out that the Xbox 360s DirectX based GPU has a tessellation unit built in, something which the Playstation 3 RSX lacks. An OpenGL extension is only as good as the hardware and developer support behind it, ATi put tessellation support into their first batch of DirectX10 graphics cards (the ATI2900’s) as well as the Xbox 360 giving developers these capabilities almost five years ago, two more than OpenGLs 3. Of course who put what feature into graphics technology first doesn’t really matter, I’m merely highlighting the flawed argument David is using when he refers to OpenGL extensions, anyone can extend on specifications.

Now for the final nail to shut this debating point down, since David is so fond of mis-quoting Carmack to sell his own point let’s see why Carmack doesn’t like the idea of ‘power’ saying in 1997: “In any case, the D3D/OpenGL argument hasn’t been about speed, but about usability, robustness, and portability”. He was right back then and he is right today, of course those of you who have been reading rather than skimming up to this point will know Carmack believes DirectX has evolved into a much better API than it was back in 1997.

OpenGL is cross-platform +1 OpenGL

A key advantage OpenGL has over DirectX, however hardly a reason in on itself to use OpenGL over DirectX, especially if you’re fine with developing exclusively for Windows (which a lot of people are). However to add to it, just because OpenGL is cross-platform does not mean you can’t make a cross-platform engine which uses DirectX on one platform and a different API on another. CryEngine 3 and UE3 are two good examples of this.

OpenGL is better for the future of games, actually not really, not at all in fact

As I have stated many times previously, core OpenGL specifications are controlled by the ARB which is made up of a consortium of various developers and hardware manufacturers. While both AMD and nVidia are on the board they are far outnumbered by various groups interested in CAD development. The OpenGL 3.0 spec’s were met with a lot of criticism (even by the fuhrer) by game developers for compromising too much for the benefit of CAD developers afraid of losing backwards compatibility on five year old hardware.

Finally David concludes by saying we need competition and freedom to ‘drive down prices and drive up quality’. Honestly that is little more than rhetoric you’ll hear at a Republican national convention, if Sarah Palin was a programmer she’d argue her point like that. Rhetoric is just that, the truth is however that DirectX is free, it is really easy to use and tuned for game developers.

So can OpenGL recover? Yes, of course it can, but to do that it must break off from the main OpenGL branch. OpenGL ES is a good thing, a separate set of standards meant to benefit game developers. However it has a long way to go and won’t be fully mature until it has re-invented itself as a graphics API for game developers, not a graphics API for CAD developers modified to be more tolerable for game development.

The best way to solve the browser wars..

The video was taken from the Silverlight 3 Presentation at Mix 09 and as was promised the code was released.

Thanks to adolfojp from Reddit for the links =)


The C++ Switch statement doesn’t actually work on strings, for example the following code would generate a compiler error (VC++ C2450):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
std::string s = "bleh";
 
switch( s )
{
case "dude":
	(...);
	break;
case "hello":
	(...);
	break;
case: "bleh":
	(...);
	break;
}

Well I say, that’s a bit of a sticky wicket. So why not just resort to an if/then/else stream? Because I wanted to use the switch statement damnit! As always it was the STL to the rescue with the always lovely map class. Unlike vectors and arrays maps let you define how to access different index locations, for example you could use floats rather than integers or even strings (which is what I wanted).

To make it work with strings here’s what I did:

[sourcecode language='cpp']//quick’n'dirty hack to make C++ switch function work more
//like C#’s The following instance of the map class (switcheroo)
//holds the enum ‘Vals’ and uses strings to identify index locations

enum Vals{val1,val2,val3,val4};
static map switcheroo;

switcheroo[ "string1" ] = val1;
switcheroo[ "string2" ]= val2;
switcheroo[ "string3" ] = val3;
switcheroo[ "string4" ] = val4;
[/sourcecode]

Now to actually use it we simply substitute switcheroo into the switch statement and use the enum values as the cases

[sourcecode language='cpp']switch ( switcheroo[ "string3" ] )
{
case val1:
(…);
break;
case val2:
(…);
break;
case val3:
(…);
break;
case val4:
(…);
break;
}[/sourcecode]

This is obviously a little more hands on than the C# version. Perhaps a class which takes in a string and returns a unique integer value would work better, still, I’ll leave that for another day.

Yes I do realize this trick has been used by many people before me. In hindsight I should have Googled the idea of using the map class to get around the int-only limitation in C++, would have saved me a lot of time.

Having used PhysX for so long I wanted to spread my wings and take a look at the littler physics engines out there. But why stop at one physics engine? Why not a whole bunch of them? Wouldn’t it be awesome to have a set of classes setup which abstractificate the various elements of a physics engine and provide the programmer with the ability to simply choose whatever physics engine they want and run it.

Well I give you the answer to that dream, or at least the not-so-finished header file of said dream,

#include "../d3dutil.h"
#include "../math/emmath.h"
class EmGeom;
 
class EmPhysicsShell
{
	friend class EmGeom;
public:
	virtual void updatephysics(crfloat timeDelta) = 0;
	virtual void initalizephysics() = 0;
 
	virtual EmGeom *CreateConvexActor(	const std::vector& vertices,
									const std::vector& indices
									) = 0;
 
	virtual EmGeom* CreateBox(const Vec3& boxDim, const Vec3& boxPos, bool dynamic = true) = 0;
	virtual EmGeom* CreateSphere(crfloat radius, const Vec3& boxPos, bool dynamic = true) = 0;
};
 
//An EmGeom is an actor in the physics sim, it can be a primitive type such as a box or sphere
//or a more complex type such as a convex or triangle mesh. In any case it MUST fill out the basic
//aspects of a geom which are the purely virtual functions you see before you
class EmGeom
{
public:
	virtual D3DXVECTOR3 getGlobalPosition() = 0;
	virtual D3DXMATRIX	getGlobalPose() = 0;
	virtual Vec3 getLinearVelocity(){Vec3 v; return v;}
 
	//must return true if the geometric object is valid, otherwise false. kthxbye
	virtual bool isValid() = 0;
};

Yes I know, I’m missing like a bajillion features here and the function names ‘getGlobalPosition()’ and ‘getGlobalPose()’ sound PhysX-ee.

Well for the time being it will have to do, I plan on learning how to fully develop this by implementing the Newton physics engine.

I wrote the matrix class a while ago, it really simplifies a lot of the generic coding you’d have to otherwise do to convert the matrix classes between PhysX and DirectX.

The basic idea is to inherit from the PhysX matrix/vector classes (NxMat34/NxVec3) and add a few constructors and overloaded operators to make it play nice with its DirectX counterparts.

Header file for EmMatrix:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
#pragma once
 
#ifndef NX_CALL_CONV
#define WIN32
#endif
 
#define NOMINMAX
 
#include "NxMat34.h"
#include <d3dx9.h>
#include "NxPhysics.h"
class EmMatrix : public NxMat34
{
public:
	EmMatrix();
	EmMatrix(const D3DXMATRIX& m);
	EmMatrix(const NxMat34& m);
 
        /*takes in an actor and gets the globalPose*/
	EmMatrix(const NxActor& act);
	D3DXMATRIX D3DMat() const;
 
	//operator overloads
	D3DXMATRIX operator*(const EmMatrix& m);
	D3DXMATRIX operator*(const D3DXMATRIX& m);
 
	EmMatrix operator*=(const EmMatrix& m);
	D3DXMATRIX operator*=(const D3DXMATRIX& m);
};[/sourcecode]
 
The implementation is fairly straight forward,
 
[sourcecode language='cpp']#include "EmMatrix.h"
 
EmMatrix::EmMatrix()
{
	t.zero();
	M.id();
}
 
EmMatrix::EmMatrix(const D3DXMATRIX& m)
{
	//D3DXMATRIX is a struct with some functions, it has an overloaded typecast
	//operator which returns the matrix in a float array of size 16. PhysX's NxMat34 can be
	//set through an NxReal array of size 16, float typecasts to NxReal and there we go.
	NxMat34::setColumnMajor44((const float*)m);
}
 
EmMatrix::EmMatrix(const NxActor& act)
{
	//This function is a little wierd, it takes in an NxActor and gets it's matrix (through 
	//NxActor::getGlobalPose)
	*this = act.getGlobalPose();
 
	//One generic unit in DirectX is 0.5 units in PhysX, to compensate for this we multiply the 
	//translation vector by 2.
	this->t *= 2;
}
 
//gets the D3DXMATRIX equivalent of the matrix class
D3DXMATRIX EmMatrix::D3DMat() const
{
	D3DXMATRIX d3dmat;
	//(float*)d3dmat Makes use of an overloaded operator FLOAT* ();
	getColumnMajor44((float*)d3dmat);
 
	return d3dmat;
}
 
//Given the above the remaining functions should be self explanatory...
 
//operator overloads (note it's better to return D3DXMATRIX in case we are dealing with d3d matrices,
//since EmMatrix is derived from NxMat34 we are not presented with any conversion issues when using them
//with physx.
D3DXMATRIX EmMatrix::operator*(const EmMatrix& m)
{
	D3DXMATRIX multiple = this->D3DMat() * m.D3DMat();
 
	return multiple;
}
 
D3DXMATRIX EmMatrix::operator*(const D3DXMATRIX& m)
{
	D3DXMATRIX curr = this->D3DMat();
	D3DXMatrixMultiply(&curr, &curr, &m);
 
	return curr;
}
 
EmMatrix EmMatrix::operator *=(const EmMatrix& m)
{
	EmMatrix multiple = *this * m;
	*this = multiple;
 
	return multiple;
}
 
D3DXMATRIX EmMatrix::operator *=(const D3DXMATRIX& m)
{
	D3DXMATRIX curr = this->D3DMat();
	D3DXMatrixMultiply(&curr, &curr, &m);
 
	this->setColumnMajor44((float*)curr);
 
	return curr;
}

Download Code

Just uploaded a DirectX/PhysX library, currently the best feature it has is a function which takes in an ID3DXMesh* pointer and converts it into an NxActor. Be sure to contact and tell me if you find any bugs or flaws in the code or just have suggestions.

Get it here.