Friday 5 June 2015

Particle revamp (Part 1)

I did not post for a little while, been attending several events in the mean time:
  • Kinecthack London: Worked on network Kinect stream and real time point cloud alignment
  • Revision : Did my first production in a demoparty (ranked 8th in demo category)
  • Node15: Workshops and some quick sneaky DX12 presentation (Note: I'm not part of Early Access Program, I just figured the API myself and managed to get some render system up and running fighting with drivers, so I'm not on NDA ;)


Ok all those events were great fun, I should explain some technical parts about it, will do at some point, but for now let's go into some other technical parts.


I wanted to revamp my particle system for a while, at some point I was thinking maybe a scratch rewrite would be fine since it's not an insane code base, but as usual my sensible way of doing comes back and I decided to just improve and refactor, which is always a better decision ;)


So I got a lot of nice features already, tons of emitters (Position, Distance Field, Texture, Mesh, Kinect2, Point clouds...), some nice interaction parts, including advanced effectors like Sph,(accelerated with spatial grids), plenty of various force fields, and collider system (mostly distance field based, after all, a plane collider is just a distance field check with a specific function).


Many effectors can also be controlled via "micro curves" (which is basically a 1d texture rendered from a track in my timeline, and driven by particle age.


All the simulation part is entirely manager in my GPU (compute shader), and is pretty fast so I did not feel that part needed a major rewrite (but will have improvements, that's for next post).


Particle counter is basically a ring buffer, and is currently managed in CPU, which is not a big deal for effectors, but a real problem for emitters, and since now most of my machines are fully dx11.1 enabled (eg: I got access to UAV at every stage feature), this becomes quite a blocker (as it opens a lot of possibilities that I'll explain later).


So I decided to revamp (here understand : improve, not rewrite) this part, but first let's explain the problem.


So we have a small structure that maintains the emission counters, like:

Code Snippet
  1. [StructLayout(LayoutKind.Sequential)]
  2. public struct EmitData
  3. {
  4.     public uint EmitCount;
  5.     public uint MaxParticles;
  6.     public uint EmitOffset;
  7.     public uint ThisFrameCount;
  8.     public uint ThisFrameStartOffset;
  9.     private Vector3 dummy;
  10. }


As you see it's just an evil mutable struct (it's only used to copy to constant buffer, don't worry ;)

Now emitter just implements a simple interface:

Code Snippet
  1. public interface IParticleEmitterObject : IParticleEffectObject
  2. {
  3.     int Emit(RenderContext context, ParticleSystem particles);
  4. }

As you see, emitters returns how many particles they did emit, then the particle system, after each emitter has been processed, updates the counters structures no next emitter starts in the right location.

This works really well for simple emitters (32 particles randomly placed), but now I also have emitters that take data from GPU.

So let's take another example, emit from texture.
This is done in the following steps:

  • Create a buffer (pos+color), with an Append flag
  • Dispatch a compute shader that will go read each pixel
  • If pixel satisfies a condition (luminance for example, but can be anything else), append the pixel position + color in the Append buffer. 
  • Copy buffer counter.
  • Dispatch N particles (where N is in CPU), get a random pixel that did satisfy the condition and push it to the particles buffers.
Now as you can clearly see that creates a problem, the amount of particles to emit is set in CPU, but we have no idea how many pixels did satisfy the condition. So we can have 3 edgy (and one can be nasty) cases (consider our emit count is 32)

  • We can have 20000 pixels that pass the test in frame 1, and emit 32 particles from that, int the next frame, I can emit 32 particles from a 200 pixel buffer. This means we have poor coverage control, I'd like to emit more particles if more pixels pass the test.
  • Less than 32 pixels passed the test (like 10), so some element get emitted several times.
  • Worst case of all, no pixels at all pass the test, so result can be... unpredictable.

So to handle those cases, we have several options:
  • In the compute shader emitter, if thread ID is > amount of elements that pass the test, push a "degenerate particle" (like position = float.maxvalue). This is of course ugly, needs to be repeated in every shader that take a gpu counter, but at least it works (even tho it could provide problems at simulation level).
  • Get the counter back in CPU: This is simple (just copy the counter back in a small staging buffer and read it back into cpu, then choose what you do), but it creates a stall as we need to flush and process the command buffer, and wait for it to be fully executed before to get back our 4 precious bytes. So again, not ideal.
  • Do it properly :) Move the counter data in GPU, and code to maintain that data to compute shader, and profit :)
So first thing, since counter data is in a constant buffer, let's not change all the shaders and the logic, so we use a small structured buffer which contains an exact copy of the data (so we process in structured buffer, and then use CopyResource to copy from StructuredBuffer to Constant Buffer

Code Snippet
  1. //Matches the cbuffer layout, so we can use copyresource
  2. struct sParticleEmitInfo
  3. {
  4.     uint emitCount;
  5.     uint maxParticles;
  6.     uint emitOffset;
  7.     uint thisFrameCount;
  8.     uint thisFrameStartOffset;
  9.     float3 dummy; //to match cbuffer padding
  10. };
  11.  
  12. StructuredBuffer<sParticleEmitInfo> ParticleEmitBuffer : PARTICLEEMITBUFFER;
  13. RWStructuredBuffer<sParticleEmitInfo> RWParticleEmitBuffer : RWPARTICLEEMITBUFFER;


Now we need to modify our emitter, so amount of particles emitted can be either provided as int (like before), or using a location in our graphics card.

So for this we model a union type, I would gladly prefer that in f# but for now don't want to move all my codebase to it :)

Code Snippet
  1. public class ParticleEmitterResult
  2. {
  3.     private StaticResult staticResult;
  4.     private DidNotRunResult didNotRunResult;
  5.     private UnorderedAccessViewResult uavResult;
  6.     private BufferResult bufferResult;
  7.     private ResultType resultType;
  8.  
  9.     private ParticleEmitterResult()
  10.     {
  11.  
  12.     }
  13.  
  14.     public static ParticleEmitterResult DidNotRun()
  15.     {
  16.         ParticleEmitterResult result = new ParticleEmitterResult();
  17.         result.didNotRunResult = new DidNotRunResult();
  18.         result.resultType = ResultType.DidNotRun;
  19.         return result;
  20.     }
  21.  
  22.     public static ParticleEmitterResult Static(uint elementCount)
  23.     {
  24.         ParticleEmitterResult result = new ParticleEmitterResult();
  25.         result.staticResult = new StaticResult(elementCount);
  26.         result.resultType = ResultType.Static;
  27.         return result;
  28.     }
  29.  
  30.     public static ParticleEmitterResult UnorderedView(UnorderedAccessView view)
  31.     {
  32.         if (view == null)
  33.             throw new ArgumentNullException("view");
  34.  
  35.         ParticleEmitterResult result = new ParticleEmitterResult();
  36.         result.uavResult = new UnorderedAccessViewResult(view);
  37.         result.resultType = ResultType.Uav;
  38.         return result;
  39.     }
  40.  
  41.     public static ParticleEmitterResult Buffer(SharpDX.Direct3D11.Buffer buffer, int offset)
  42.     {
  43.         if (buffer == null)
  44.             throw new ArgumentNullException("buffer");
  45.         if (offset < 0)
  46.             throw new ArgumentOutOfRangeException("offset", "offset must be greater than 0");
  47.  
  48.         ParticleEmitterResult result = new ParticleEmitterResult();
  49.         result.bufferResult = new BufferResult(buffer, offset);
  50.         result.resultType = ResultType.Buffer;
  51.         return result;
  52.     }
  53.  
  54.     internal void Handle(RenderContext context, IParticleEmitterResultHandler handler)
  55.     {
  56.         switch(this.resultType)
  57.         {
  58.             case ResultType.DidNotRun:
  59.                 handler.HandleDidNotRun(context);
  60.                 break;
  61.             case ResultType.Static:
  62.                 handler.HandleStaticResult(context, this.staticResult.ElementCount);
  63.                 break;
  64.             case ResultType.Uav:
  65.                 handler.HandleUavResult(context, this.uavResult.UnorderedView);
  66.                 break;
  67.             case ResultType.Buffer:
  68.                 handler.HandleBuffer(context, this.bufferResult.Buffer, this.bufferResult.Offset);
  69.                 break;
  70.         }
  71.     }
  72.  
  73.     public enum ResultType { DidNotRun, Static, Uav, Buffer }
  74.  
  75.     public sealed class DidNotRunResult
  76.     {
  77.  
  78.     }
  79.  
  80.     public sealed class StaticResult
  81.     {
  82.         private readonly uint elementCount;
  83.  
  84.         public uint ElementCount
  85.         {
  86.             get { return this.elementCount; }
  87.         }
  88.  
  89.         public StaticResult(uint elementCount)
  90.         {
  91.             this.elementCount = elementCount;
  92.         }
  93.     }
  94.  
  95.     public sealed class UnorderedAccessViewResult
  96.     {
  97.         private readonly UnorderedAccessView view;
  98.         
  99.         public UnorderedAccessView UnorderedView
  100.         {
  101.             get { return this.view; }
  102.         }
  103.  
  104.         public UnorderedAccessViewResult(UnorderedAccessView view)
  105.         {
  106.             this.view = view;
  107.         }
  108.     }
  109.  
  110.     public sealed class BufferResult
  111.     {
  112.         private readonly SharpDX.Direct3D11.Buffer buffer;
  113.         private readonly int offset;
  114.  
  115.         public SharpDX.Direct3D11.Buffer Buffer
  116.         {
  117.             get { return this.buffer; }
  118.         }
  119.  
  120.         public int Offset
  121.         {
  122.             get { return this.offset; }
  123.         }
  124.  
  125.         public BufferResult(SharpDX.Direct3D11.Buffer buffer, int offset)
  126.         {
  127.             this.buffer = buffer;
  128.             this.offset = offset;
  129.         }
  130.     }
  131. }


This is a bit cumbersome, but well it's pretty safe to use.
As a side note in f# this would look like this:


Code Snippet
  1. open SharpDX.Direct3D11
  2. open System
  3.  
  4. type ParticleSystemArgs(maxElements:int) =
  5.     member x.maxElements = maxElements
  6.  
  7. type StaticEmitResult(elementCount:int) =
  8.     member x.elementCount = if elementCount < 0 then raise(ArgumentOutOfRangeException("elementCount","Muse be greater or equalthan 0")) else elementCount
  9.  
  10. type EmitUnorderedViewResult(view:UnorderedAccessView) =
  11.     member x.view = if view = null then raise(ArgumentNullException("view")) else view
  12.  
  13. type EmitBufferResult(buffer:Buffer,offset:int) =
  14.     member x.buffer = if buffer = null then raise(ArgumentNullException("buffer")) else buffer
  15.     member x.offset = if offset < 0 then raise(ArgumentOutOfRangeException("offset","Offset should be greater or equal than 0")) else offset
  16.  
  17. type ParticleEmitResult =
  18.     | DidNotEmit of unit
  19.     | StaticResult of StaticEmitResult
  20.     | UavResult of EmitUnorderedViewResult
  21.     | BufferResult of EmitBufferResult
  22.  
  23. type IParticleEmitter =
  24.    // abstract method
  25.    abstract member Emit: ParticleSystemArgs -> ParticleEmitResult
  26.  
  27. type IParticleEmitHandler =  
  28.     abstract member HandleNoEmit : unit -> unit
  29.     abstract member HandleStatic : int -> unit
  30.     abstract member HandleUav : UnorderedAccessView -> unit
  31.     abstract member HandleBuffer : EmitBufferResult -> unit
  32.  
  33.  
  34. module ParticleFunctions =
  35.  
  36.     let ApplyHandler (x:ParticleEmitResult, handler: IParticleEmitHandler)=
  37.         match x with
  38.             | DidNotEmit d -> handler.HandleNoEmit()
  39.             | StaticResult sr -> handler.HandleStatic(sr.elementCount)
  40.             | UavResult ur -> handler.HandleUav(ur.view)
  41.             | BufferResult br -> handler.HandleBuffer(br)
  42.  
  43.     let DoEmit(emitter:IParticleEmitter,args:ParticleSystemArgs, handler : IParticleEmitHandler) =
  44.         ApplyHandler(emitter.Emit(args), handler)
Much more concise and the pattern matching is much safer in there, but whatever :)

Code Snippet
  1. public interface IParticleEmitterObject : IParticleEffectObject
  2. {
  3.     ParticleEmitterResult Emit(RenderContext context, ParticleSystem particles);
  4. }


Pretty simple, now we can return different data types that can contain data.

And now we have another interface to handle result, as :


Code Snippet
  1. public interface IParticleEmitterResultHandler
  2. {
  3.     void HandleDidNotRun(RenderContext context);
  4.     void HandleStaticResult(RenderContext context, uint elementCount);
  5.     void HandleUavResult(RenderContext context, UnorderedAccessView uav);
  6.     void HandleBuffer(RenderContext context, SharpDX.Direct3D11.Buffer buffer, int offset);
  7. }


As you can see, there's 4 cases, let's first explain those:

  • Did not run: Emitter did not run at all, I decided to have it as a case instead of returning 0 (so you can also explain why it did not run.
  • Static : This is the same case as our previous cases
  • UnorderedView : Counter is located in UAV, which is the case we we use Emit/Counter buffers to push particles. So for example if we want to emit every pixel that did pass the test in our previous case, we do an indirect dispatch and return the view (which contains the counter)
  • Buffer : This is in a GPU buffer (we also need to provide location in that case). This is very useful for coverage based emitters (for example, we could say, emit 50% of the elements that passed the test every frame, in that case we need to process the counter in a small compute shader to generate a custom dispatch call).
So from there, we can easily update our structured buffer above (using compute shader, but actually kept the readback version for debug purposes)

Now the only small difference is when processing effectors, we don't know the particle count anymore, so instead of using Dispatch we use DispatchIndirect (which is trivial to implement), and use Indirect buffers for drawing as well so, DrawIndirect to draw as sprite, and DrawIndexedInstancedIndirect to render particles as geometry.

So here we go, from there we have a fully fledged counter system in our graphics card, which also mean, for any type of emitters where we want to push every element that pass the test, we can now do it in a single pass (no more need for intermediate buffer, use a CounterBuffer or use InterlockedAdd).

And now since got access to UAV at every stage, it's possible to load balance using tessellation/domain shader

Here is an example of an hybrid particle emitter (which doesn't draw anything on screen but just push adaptive amount of particles depending on triangle size)

Declaration:


Code Snippet
  1. cbuffer cbemitParams : register(b0)
  2. {
  3.     float MinSize = 0.1f;
  4.     float MaxSize = 20.0f;
  5.     float MinimumTessel = 1.0f;
  6.     float MaximumTessel = 12.0f;
  7.     float VelocityScale = 1.0f;
  8. };
  9.  
  10. struct vsInput
  11. {
  12.     float3 p : POSITION;
  13.     float3 n : NORMAL;
  14. };
  15.  
  16. struct hsConstOutput
  17. {
  18.     float edges[3]        : SV_TessFactor;
  19.     float inside[1]       : SV_InsideTessFactor;
  20. };


Our hardcore vertex and hull shaders:


Code Snippet
  1. vsInput VS(vsInput input)
  2. {
  3.     return input;
  4. }
  5.  
  6. [domain("tri")]
  7. [partitioning("fractional_even")]
  8. [outputtopology("triangle_cw")]
  9. [outputcontrolpoints(3)]
  10. [patchconstantfunc("HSConst")]
  11. vsInput HS(InputPatch<vsInput, 3> input, uint id : SV_OutputControlPointID)
  12. {
  13.     return input[id];
  14. }


Now the hull constant function, which defines tesselation factor based on triangle size:

Code Snippet
  1. hsConstOutput HSConst(InputPatch<vsInput, 3> patch)
  2. {
  3.     hsConstOutput output;
  4.     
  5.     float3 p1 = patch[0].p;
  6.     float3 p2 = patch[1].p;
  7.     float3 p3 = patch[2].p;
  8.     
  9.     float v = length(cross(p2-p1,p3-p1));
  10.     
  11.     float r = MaxSize - MinSize;
  12.     float n = (v - MinSize) / r;
  13.     float f = MinimumTessel + n * (MaximumTessel - MinimumTessel);
  14.     f = clamp(f,MinimumTessel,MaximumTessel);
  15.     
  16.     output.edges[0] = f;
  17.     output.edges[1] = f;
  18.     output.edges[2] = f;
  19.     output.inside[0] =f;
  20.     
  21.     return output;
  22. }


And the domain shader, which performs the emission:

Code Snippet
  1. [domain("tri")]
  2. void DS(hsConstOutput input, OutputPatch<vsInput, 3> op, float3 dl : SV_DomainLocation)
  3. {
  4.     uint vid = RWPositionBuffer.IncrementCounter();
  5.  
  6.     float3 p = uv.x * op[0].p
  7.         + uv.y * op[1].p
  8.         + uv.z * op[2].p;
  9.  
  10.     float3 n = uv.x * op[0].n
  11.         + uv.y * op[1].n
  12.         + uv.z * op[2].n;
  13.  
  14.     n = normalize(n) * VelocityScale;
  15.  
  16.     uint particleid = (vid + EmitOffset) % MaxParticles;
  17.  
  18.     RWPositionBuffer[particleid] = p;
  19.     RWInitialPositionBuffer[particleid] = p;
  20.     RWVelocityBuffer[particleid] = n;
  21. }

As you see we perform increment counter on Particle position buffer (flag is set to 0 before to run the shader).

Then we just return the position buffer UAV as a result, which contains the total amount of particles that got emitted.

Make sure to disable pixelshader and geometry shader, set a dummy viewport (otherwise it will not run), and disable depth state, and profit :)

Next part, demoing various emitters, and explain some other new features that this did open.




No comments:

Post a Comment