Sunday, 26 May 2013

Deffered Lights

As many know I use deffered rendering a lot.

Most of post processing in DirectX11 can now be done in Forward (HBAO / Depth Of Field use depth only), but of course as soon as you include shading into the mix, you need Normals and a bit of Extra Data.

So you need some MRT setup, for my lights i use only 2 targets:

Code Snippet
  1. struct PS_OUT
  2. {
  3.     float4 albedo :SV_Target0;
  4.     float4 ns : SV_Target1;
  5. }; is color, and alpha channel is used for roughness is normals (view space), ns.a is reflectivity

I could add a 3rd target for specular albedo, but never had the use for it, maybe I'll add it one day (could use the 4th target for some shadow bits as well)

Ok so now we our shiny normal target filled up with fancy geometry, we need to add some shading into it.

Directional light is full screen, not many ways to cull that (except using some stencil culling if possible)

Point lights you have plenty of techniques, I remember doing Light volumes in DX9 a few years ago, it's pretty easy.

Of course now you have shiny new technique, using compute tile lights (I based my implementation from Andrew Lauritzen, which also have some nice techniques using Geometry Shader if your device is not SM5.0 capable)

So on the list of techniques:

  • Compute Tiles : Send a structured buffer, cull lights against tiles with mini frustrum and only process lights that pass the test
  • Clip Quad : Send a structured buffer, and draw n points (single call), in Vertex Shader convert the point light (sphere) into a screen space quad, expand your point into quad in GS, process light in PS (use additive blending).
  • Clip Quad (low) : Some devices don't support raw/structured buffers, so instead you copy your lights into a Vertex Buffer and do more or less the same (you change structure input instead of doing lookup)
Here is how VS input looks for Clip Quad:

Code Snippet
  1. StructuredBuffer<PointLight> LightBuffer : LIGHTBUFFER;
  3. struct vsInput
  4. {
  5.     uint iv : SV_VertexID;
  6. };

And the Vertex Buffer Version

Code Snippet
  1. struct vsInput
  2. {
  3.     float3 lightpos : LIGHTPOSITION;
  4.     float attenuationBegin : LIGHTATTENUATIONBEGIN;    
  5.     float3 color : LIGHTCOLOR;
  6.     float attenuationEnd : LIGHTATTENUATIONEND;    
  7. };

So both techniques are mostly equivalent, but the Structured Buffer one is quite nice in case you have 10.1 feature set (since you can use Compute Shader to move you lights for example)

In Case of VertexBuffer, you can still use Stream Output to animate lights in GPU, which is also fine, but adds a bit of extra coding around.

Please of course note that Stencil Test can also be used for GS techniques.

Now we have a decently efficient way to draw multiple lights, there's still one part missing, light equations ))

So now using a few brdf (depending on my scene):
  • Phong (never using it)
  • Oren Nayar (I really like it for splines)
  • Cook Torrance (for more or less all the rest)
  • Ward (used it before, but now i mostly use the 2 above)

So one option is to use Techniques, but in that cases it implies a lot of code duplication eg: not too nice.

That leaves 2 other options:

  • Shader Linkage
  • Shader Macros
Unless you have Feature Level 11, you stuck with macros, since you can use dynamic link.

It's pretty simple, you use defines and #ifdef / #if , then you specify defines when you compile shader, so basically you compile one shader instance per brdf.
Code Snippet
  1. #define LIGHT_PHONG 0
  2. #define LIGHT_WARD 1
  4. #define LIGHT_OREN_SIMPLE 3
  5. #define LIGHT_OREN_COMPLEX 4
  7. //Override macros on compile
  8. #ifndef LIGHT_TYPE
  10. #endif
  12. /* Here we can't use interfaces in case we haven't got sm5 support, so default to 0 */
  13. #ifndef USE_INTERFACE
  14. #define USE_INTERFACE 0
  15. #endif
  17. #if USE_INTERFACE == 1
  18. #include "Brdf.fxh"
  19. #else
  20. #include "Brdf_Funcs.fxh"
  21. #endif

So when you compile shader you also send the define LIGHT_TYPE, and then you switch shader depending on equation.

It's of course handy since you have a single code base, any improvement impacts on all brdf at once.

Now with shader model 5 you also have a nitfy feature called Shader Linkage, so now our code looks like:

Code Snippet
  1. interface iBRDF
  2. {
  3.     void Accumulate
  4.     (
  5.         in float3 n,
  6.         in float3 l,
  7.         in float3 v,
  8.         in float3 lightContrib,
  9.         in float fRoughness,
  10.         in float fSpec,
  11.         in float atten,
  12.         inout float3 result
  13.     );
  14. };

Next we declare Brdfs (one class per equation)

Code Snippet
  1. class cWard : iBRDF
  2. {
  3.     void Accumulate
  4.     (
  5.         in float3 n,
  6.         in float3 l,
  7.         in float3 v,
  8.         in float3 lightContrib,
  9.         in float fRoughness,
  10.         in float fSpec,
  11.         in float atten,
  12.         inout float3 result
  13.     )
  14.     {
  15.         //Do you light cooking here
  16.     }
  17. };

And to finish we instantiate all that lot:

Code Snippet
  2. cPhong phong;
  3. cWard ward;
  4. cCookTorrance cooktorrance;
  5. cOrenSimple orensimple;
  6. cOrenComplex orencomplex;
  8. iBRDF brdf <string linkclass="phong,ward,cooktorrance,orensimple,orencomplex";>;

Annotation is just used for reflection in my case, so it builds an Enum automatically

What is nice with this technique is that you only need one shader instance, and can switch light equation on the fly very easily:

Code Snippet
  1. public void BindClass(string classname, string interfacename)
  2.         {
  3.             this.Effect.GetVariableByName(interfacename).AsInterface().ClassInstance = this.Effect.GetVariableByName(classname).AsClassInstance();
  4.         }

Pretty cool no? And with shader reflection there's even no need to change any code, all is done automatically ))

Now of course one important part is : what are performances?

So please note that using shader linkage in that case implies a performance hit, which is quite non negligible if you not on high end hardware (I noticed it was quite significant on laptop, not as bad on desktop).

So shader linkage adds flexibility (very easy to avoid lot of permutation, or compile of shaders to handle all possibilities, but that comes as a cost). It's then up to you to abuse timestamp queries and decide if the hit is worth taking ;)

That's it for now, next post... I don't know ;)

No comments:

Post a Comment