Wednesday, 29 May 2013

Graph Collector

After releasing new DirectX11 pack for vvvv and beta 30 :, Time to go back into dev.

I added new Segment Node, which allows Z extrusion. It's pretty nice playing with Phase/Cycles/Resolution you can do some nice designs, here are couple of shots below (no fancy shading yet).

So now issue with designing it, i have heap lot of shader nodes, layers and groups, so it's a big patch to maintain.

Exporting it as file would be handy, so I could just design my bits and use a file loader, fast switch, clean patch, mmmmhhh :)

So first thought was collada, but it's such a crap format, way too big for what it does.

I'd have to store vertices for every bit + transform, projected file size: 10 megs at least, bleh...

So second thought was to create module for every node, and have it output relevant data as string as well as render it, so cumbersome, bleh again...

Then last thought, oh I got dx11 source code (since I wrote it), so i should be able to traverse the graph and store data instead of render (and then serialize as any format like json, no xml please ;)


  • I don't want to store vertices/indices, takes too much space. I already know what geometry I use, so it would be better to just tell how to generate it again.
  • Most 3d formats suck for batching, in my case I can have the same geometry sent 10 times to shader, most 3d software would replicate geometry (come on, please start to do proper format one day), Instead it would be much easier to tell geometry-> transform list.
So for the moment once geometry is created I got no way to know how it was built.

So I simply added this in geometry provider

Code Snippet
  1. string PrimitiveType { get; set; }
  3. object Tag { get; set; }

Then for each type of geometry I create a descriptor, here is the base;

Code Snippet
  1. public abstract class AbstractPrimitiveDescriptor
  2. {
  3.     public abstract string PrimitiveType { get; }
  4.     public abstract void Initialize(Dictionary<string, object> properties);
  5.     public abstract IDX11Geometry GetGeometry(DX11RenderContext context);
  6. }

Then here is sphere:

Code Snippet
  1. public class Sphere : AbstractPrimitiveDescriptor
  2. {
  3.     public float Radius { get; set; }
  4.     public int ResX { get; set; }
  5.     public int ResY { get; set; }
  6.     public float CyclesX { get; set; }
  7.     public float CyclesY { get; set; }
  9.     public override string PrimitiveType { get { return "Sphere"; } }
  11.     public override void Initialize(Dictionary<string, object> properties)
  12.     {
  13.         this.Radius = (float)properties["Radius"];
  14.         this.ResX = (int)properties["ResX"];
  15.         this.ResY = (int)properties["ResY"];
  16.         this.CyclesX = (float)properties["CyclesX"];
  17.         this.CyclesY = (float)properties["CyclesY"];
  18.     }
  20.     public override IDX11Geometry GetGeometry(DX11RenderContext context)
  21.     {
  22.         return context.Primitives.Sphere(this);
  23.     }
  24. }

That's it, instead of storing 1000 vertices i can just store 5 parameters instead, how cool is that?

When I create geometry, I just assign the descriptor to the geometry, so I can know at any time how things were created.

Now that's cool, but I need to grab geometry, so here is how it works.
Layer have RenderSettings which carry some info on how to render, so I simply added collector as render hint.

Code Snippet
  1. public enum eRenderHint { Forward, MRT, Shadow, Overlay, Collector }

Then settings carry:

Code Snippet
  2. public class DX11ObjectGroup
  3. {
  4.     public DX11ObjectGroup()
  5.     {
  6.         this.RenderObjects = new List<DX11RenderObject>();
  7.     }
  9.     public string ShaderName { get; set; }
  11.     public List<DX11RenderObject> RenderObjects { get; set; }
  12. }
  14. public class DX11RenderObject
  15. {
  16.     public string ObjectType { get; set; }
  18.     public object Descriptor { get; set; }
  20.     public Matrix[] Transforms { get; set; }
  21. }

So renderer only has to set it's flag to collector (tells shader that we don't want to render but collect information).

Shader then can just retrieve descriptor, and push it with list of associated transforms (huhu, batch friendly ;)

Little snippet for shader node, if render hint is set to collect it just appends object/transform instead of render.

Code Snippet
  1. IDX11Geometry g = this.FGeometry[0][context];
  2. if (g.Tag != null)
  3. {
  4.     DX11RenderObject o = new DX11RenderObject();
  5.     o.ObjectType = g.PrimitiveType;
  6.     o.Descriptor = g.Tag;
  8.     o.Transforms = new Matrix[spmax];
  9.     for (int i = 0; i < this.spmax; i++)
  10.     {
  11.         o.Transforms[i] = this.mworld[i % this.mworldcount];
  12.     }
  13.     group.RenderObjects.Add(o);
  15.     settings.ObjectCollector.Add(group);
  16. }

Object list is then serialized as Json Object, here is one part of it:

Code Snippet
  1. {
  2.   "ObjectType": "SegmentZ",
  3.   "Descriptor": {
  4.     "Phase": 0.0,
  5.     "Cycles": 0.25,
  6.     "InnerRadius": 0.94,
  7.     "Z": 0.03,
  8.     "Resolution": 100
  9.   },
  10.   "Transforms": [
  11.     "-1.43, 1.751187E-16, 0, 0, -1.751187E-16, -1.43, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1",
  12.     "1.43, 0, 0, 0, 0, 1.43, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1"
  13.   ]
  14. },

Now I can simply reload the whole scene from the file, (for example above, whole scene fits in 33kb indented, 22kb compressed, pretty nice ;)

So next was just doing a Loader node for testing, pretty simple, retrieve descriptor, create geom, associate with transform and render. Please note that could do an importer for anything, it's not only 4v related (technically I got a use case for it, top secret ;)

The obvious great thing is it's totally transparent, I only need to enable collect, and I can retrieve whole scene, no need for any fancy stuff, and works for any part I have already done as well, no mod needed.

Here are few example of it:

That's it, now of course will need improvements, but pretty excited about possibilities. Having source code for render system is quite invaluable, without it would have done the dirty module way, brrr....scary!!

Sunday, 26 May 2013

Deffered Lights

As many know I use deffered rendering a lot.

Most of post processing in DirectX11 can now be done in Forward (HBAO / Depth Of Field use depth only), but of course as soon as you include shading into the mix, you need Normals and a bit of Extra Data.

So you need some MRT setup, for my lights i use only 2 targets:

Code Snippet
  1. struct PS_OUT
  2. {
  3.     float4 albedo :SV_Target0;
  4.     float4 ns : SV_Target1;
  5. }; is color, and alpha channel is used for roughness is normals (view space), ns.a is reflectivity

I could add a 3rd target for specular albedo, but never had the use for it, maybe I'll add it one day (could use the 4th target for some shadow bits as well)

Ok so now we our shiny normal target filled up with fancy geometry, we need to add some shading into it.

Directional light is full screen, not many ways to cull that (except using some stencil culling if possible)

Point lights you have plenty of techniques, I remember doing Light volumes in DX9 a few years ago, it's pretty easy.

Of course now you have shiny new technique, using compute tile lights (I based my implementation from Andrew Lauritzen, which also have some nice techniques using Geometry Shader if your device is not SM5.0 capable)

So on the list of techniques:

  • Compute Tiles : Send a structured buffer, cull lights against tiles with mini frustrum and only process lights that pass the test
  • Clip Quad : Send a structured buffer, and draw n points (single call), in Vertex Shader convert the point light (sphere) into a screen space quad, expand your point into quad in GS, process light in PS (use additive blending).
  • Clip Quad (low) : Some devices don't support raw/structured buffers, so instead you copy your lights into a Vertex Buffer and do more or less the same (you change structure input instead of doing lookup)
Here is how VS input looks for Clip Quad:

Code Snippet
  1. StructuredBuffer<PointLight> LightBuffer : LIGHTBUFFER;
  3. struct vsInput
  4. {
  5.     uint iv : SV_VertexID;
  6. };

And the Vertex Buffer Version

Code Snippet
  1. struct vsInput
  2. {
  3.     float3 lightpos : LIGHTPOSITION;
  4.     float attenuationBegin : LIGHTATTENUATIONBEGIN;    
  5.     float3 color : LIGHTCOLOR;
  6.     float attenuationEnd : LIGHTATTENUATIONEND;    
  7. };

So both techniques are mostly equivalent, but the Structured Buffer one is quite nice in case you have 10.1 feature set (since you can use Compute Shader to move you lights for example)

In Case of VertexBuffer, you can still use Stream Output to animate lights in GPU, which is also fine, but adds a bit of extra coding around.

Please of course note that Stencil Test can also be used for GS techniques.

Now we have a decently efficient way to draw multiple lights, there's still one part missing, light equations ))

So now using a few brdf (depending on my scene):
  • Phong (never using it)
  • Oren Nayar (I really like it for splines)
  • Cook Torrance (for more or less all the rest)
  • Ward (used it before, but now i mostly use the 2 above)

So one option is to use Techniques, but in that cases it implies a lot of code duplication eg: not too nice.

That leaves 2 other options:

  • Shader Linkage
  • Shader Macros
Unless you have Feature Level 11, you stuck with macros, since you can use dynamic link.

It's pretty simple, you use defines and #ifdef / #if , then you specify defines when you compile shader, so basically you compile one shader instance per brdf.
Code Snippet
  1. #define LIGHT_PHONG 0
  2. #define LIGHT_WARD 1
  4. #define LIGHT_OREN_SIMPLE 3
  5. #define LIGHT_OREN_COMPLEX 4
  7. //Override macros on compile
  8. #ifndef LIGHT_TYPE
  10. #endif
  12. /* Here we can't use interfaces in case we haven't got sm5 support, so default to 0 */
  13. #ifndef USE_INTERFACE
  14. #define USE_INTERFACE 0
  15. #endif
  17. #if USE_INTERFACE == 1
  18. #include "Brdf.fxh"
  19. #else
  20. #include "Brdf_Funcs.fxh"
  21. #endif

So when you compile shader you also send the define LIGHT_TYPE, and then you switch shader depending on equation.

It's of course handy since you have a single code base, any improvement impacts on all brdf at once.

Now with shader model 5 you also have a nitfy feature called Shader Linkage, so now our code looks like:

Code Snippet
  1. interface iBRDF
  2. {
  3.     void Accumulate
  4.     (
  5.         in float3 n,
  6.         in float3 l,
  7.         in float3 v,
  8.         in float3 lightContrib,
  9.         in float fRoughness,
  10.         in float fSpec,
  11.         in float atten,
  12.         inout float3 result
  13.     );
  14. };

Next we declare Brdfs (one class per equation)

Code Snippet
  1. class cWard : iBRDF
  2. {
  3.     void Accumulate
  4.     (
  5.         in float3 n,
  6.         in float3 l,
  7.         in float3 v,
  8.         in float3 lightContrib,
  9.         in float fRoughness,
  10.         in float fSpec,
  11.         in float atten,
  12.         inout float3 result
  13.     )
  14.     {
  15.         //Do you light cooking here
  16.     }
  17. };

And to finish we instantiate all that lot:

Code Snippet
  2. cPhong phong;
  3. cWard ward;
  4. cCookTorrance cooktorrance;
  5. cOrenSimple orensimple;
  6. cOrenComplex orencomplex;
  8. iBRDF brdf <string linkclass="phong,ward,cooktorrance,orensimple,orencomplex";>;

Annotation is just used for reflection in my case, so it builds an Enum automatically

What is nice with this technique is that you only need one shader instance, and can switch light equation on the fly very easily:

Code Snippet
  1. public void BindClass(string classname, string interfacename)
  2.         {
  3.             this.Effect.GetVariableByName(interfacename).AsInterface().ClassInstance = this.Effect.GetVariableByName(classname).AsClassInstance();
  4.         }

Pretty cool no? And with shader reflection there's even no need to change any code, all is done automatically ))

Now of course one important part is : what are performances?

So please note that using shader linkage in that case implies a performance hit, which is quite non negligible if you not on high end hardware (I noticed it was quite significant on laptop, not as bad on desktop).

So shader linkage adds flexibility (very easy to avoid lot of permutation, or compile of shaders to handle all possibilities, but that comes as a cost). It's then up to you to abuse timestamp queries and decide if the hit is worth taking ;)

That's it for now, next post... I don't know ;)

Wednesday, 8 May 2013

Geometry Generation (Fast)

Ok so first post here, let's start by something fun :)

One common thing needed in computer graphics is some form of geometry.

Some can come from 3d software and you use an importer (like Assimp, or your own), some geometries can be generated on the fly (Grid/Spheres/Box/Torus...)

 Geometry generation is generally done as follow:

  • Build a bulk of vertices in CPU (some for loop)
  • Build an IndexBuffer in CPU (some for loop)
  • Copy your buffers in GPU
  • Use it !
Pretty simple uh?

Only issue is now it you want to modify your geometry parameters in real time (and you have decently high resolution geometry), you need to:
  • Rebuild the bulk of vertices
  • Copy again
  • Eventually do the same for index buffer
That can get pretty slow, so let's look how to pimp that a tad.

Our friend DirectX 10 brings a feature called Stream Output, which allows you to render to buffers. This buffer can be either an Index Buffer or a Vertex Buffer, oh, that sounds quite convenient.

So let's say we want to build a grid (pretty simple example, but grids are decently used around as bezier/surfaces/heightfields...), resolution being 512*512

Here is our vertex shader input:

Code Snippet
  1. struct vsInput
  2. {
  3.     uint iv : SV_VertexID;
  4. };

And our desired output:

Code Snippet
  1. struct vsOutput
  2. {
  3.     float3 pos  : POSITION;
  4.     float3 norm  : NORMAL;
  5.     float2 uv : TEXCOORD0;
  6. };

Since we don't provide input geometry, we set IA stage to null (we don't send any geometry as input), and Assign the VertexBuffer as StreamOutput target.

Then the following lines of code generate vertices:

Code Snippet
  1. vsOutput VS_Vertices(vsInput input)
  2. {
  3.     vsOutput o;
  4.     int colindex = input.iv % colcount;
  5.     int rowindex = input.iv / colcount;    
  6.     float2 uv = float2(input.iv % colcount,input.iv / colcount) * invgridsize;
  7.     float2 pos = uv;
  9.     uv.y = 1.0f - uv.y;
  10.     pos.xy = (pos.xy - 0.5f) * size;
  11.     o.pos = float3(pos,0.0f);
  12.     o.norm = float3(0,0,-1);
  13.     o.uv = uv;
  14.     return o;
  15. }

Pretty simple:

  • From the vertex id (simple GPU counter), we retrieve the cell position.
  • To normalize it (eg: fit in 0/1), we simply multiply by inverse resolution
  • We flip Y axis (for uv only)
  • Recenter positition (so our grid center is in 0,0 not in 0.5,0.5)
  • Done, that was so hard... :)
Now we need to generate our indices (since some vertices are shared). 
We have few options here:
  • Set our Stream Output as triangle: This is not convenient at all, since grid geometry is quad based, so we'll need to replicate a lot of calculations, and flip triangle depending on vertex id, which is not very friendly.
  • Set our Stream Output as Quad: Instead of generating one triangle at a time in Vertex Shader, we will generate 2, which fits perfectly a cell expansion, and reuse calculations.
So obvious choice, quads

Here is our Vertex Shader output:

Code Snippet
  1. struct vsOutputIndices
  2. {
  3.     int3 t1 : TRIANGLE0;
  4.     int3 t2 : TRIANGLE1;
  5. };

And the (extremely) hardcore Vertex Shader:

Code Snippet
  1. vsOutputIndices VS_Indices(vsInput input)
  2. {
  3.     vsOutputIndices o;
  4.     int j = input.iv / (colcount-1);
  5.     int i = input.iv % (colcount-1);    
  6.     int rowlow = j * (colcount);
  7.     int rowup = (j+1) * (colcount);
  9.     o.t1 = int3(rowlow + i, rowup + i, rowlow + i + 1);
  10.     o.t2 = int3(rowlow + i + 1, rowup + i, rowup + i + 1);
  11.     return o;
  12. }

Here we do more or less the same as generating positions, excluding we need to take care or row/column indices stride.

Here we are, generating a Grid without CPU.

Nice thing about it is since other geometries can also be interpreted as a parametric surface, we can easily modify shader to generate them (only shader to generate vertices needs modifications, index buffer builder is the same).

Here is how to build a sphere:

Code Snippet
  1. float3 sphere(float u, float v)
  2. {
  3.     float3 p;
  4.     u = (u + 0.5) * PI * 2.0f;
  5.     v = (v + 0.5) * PI * 2.0f;
  6.     float su = sin(u); float sv = sin(v); float cu = cos(u); float cv = cos(v);
  7.     p.x = su*sv;
  8.     p.y = cu*sv;
  9.     p.z = cv;    
  10.     return p;
  11. }
  13. vsOutput VS_Vertices(vsInput input)
  14. {
  15.     vsOutput o;
  16.     int colindex = input.iv % colcount;
  17.     int rowindex = input.iv / colcount;    
  18.     float2 uv = float2(input.iv % colcount,input.iv / colcount) * invgridsize;
  19.     float2 pos = uv;    
  20.     uv.y = 1.0f - uv.y;
  21.     o.pos = sphere(uv.x * CyclesU,uv.y*CyclesV)*radius;
  22.     o.norm = normalize(o.pos);
  23.     o.uv = uv;
  24.     return o;
  25. }

Many other surfaces can be built same style (ideally if you can have partial derivatives for normals that helps)

Grid deformation was quite decently used in vvvv (deforming in vertex shader and then shading it), obvious win by using StreamOut is geometry is also reusable (shadow/forward...)

That's it for now, stay tuned.

First Post

Hello, here we are again, after blogging privately about DirectX 11 development for vvvv, I felt I would open a new blog, since now I can speak openly about it.

Mostly (as usual), I'll speak about real time graphics and shader coding (really?), but also commenting on other activities on the front (vuo/realtime studio/vvvv....).

So here is a taste of what to come:

  • Shader Techniques
  • VVVV Stuff
  • Realtime Studio (Outracks) beta overview
  • Vuo presentation overview
  • DirectX 11.1 for High End machines (UAV at every stage !!!)
  • DirectX 11/11.1 for Low feature sets (tablets with no buffer support, phones with 9.3 feature level)
  • OpengGL
  • Cuda
  • Random mumbling :)
  • Flare (you'll hear about it)