Catflier: October 2015

In the previous post, I spoke about the ability to perform hit detection using analytical functions.

This works extremely well when we can restrict our use case to it, but now we have some other cases where this is not as ideal:

Perform detection on arbitrary shape/3d model.
User input is not a pointer anymore, but can also be arbitrary (threshold camera texture, Kinect Body Index)
Both previous cases combined together

While we can often perform detection for 3d model by using triangle raycast (I'll keep that one for next post), it can be pretty expensive (specially if we perform a 10 touch hit detection, we need to raycast 10 times).

So instead, one easy technique is to use ID map.

Concept is extremely simple, instead of performing hit with a function, we will render our scene into a UInt texture, where each pixel will be object ID.

Of course it means you have to render your scene another time, but in that case you can also easily use the following:

Render to a downsized texture (512*512 is often sufficient)
Render either bounding volumes, or simplified versions of our 3d models.

Great thing with this technique, our depth buffer already makes sure that we have closest object ID stored (so we get that for "free").

So now we have our ID map, picking objectID from pointer is trivial:

Code Snippet

Texture2D<uint> ObjectIDTexture;
 
RWStructuredBuffer<uint> RWObjectBuffer : BACKBUFFER;
 
float2 MousePosition;
int width = 512;
int height = 424;
[numthreads(1,1,1)]
void CS(uint3 tid : SV_DispatchThreadID)
{
    uint w,h;
    ObjectIDTexture.GetDimensions(w,h);
    
    float2 p = MousePosition;
    p = p * 0.5f  + 0.5f;
    p.y = 1.0f-p.y;
    p.x *= (float)w;
    p.y *= (float)h;
    
    uint obj = ObjectIDTexture.Load(int3(p,0));
    RWObjectBuffer[0] = obj;
 
}

Not much more is involved, we grab the pixel id, store in a buffer that we can retrieve in staging.

In case we need multiple pointer, we only need to grab N pixels instead, so process stays pretty simple (and we don't need to render scene for each pointer).

Now as mentioned before, we might need to perform detection against arbitrary texture.

As a starter, for simplicity, I will restrict the use case to single user texture.

So first we render user into a R8_Uint texture , where 0 means no active user and anything else = active.

We render our object map next in the same resolution.

We create a buffer (same size as object count, uint), that will store how many user pixel hit an object pixel.

Dispatch to perform this count.

Use another Append buffer, that select elements over a minimum account of pixel (this is generally important to avoid noise with camera/kinect textures).

Accumulating pixel hit count is done this way:

Code Snippet

Texture2D<uint> ObjectIDTexture;
Texture2D<float> InputTexture;
 
RWStructuredBuffer<uint> RWObjectBuffer : BACKBUFFER;
 
float Minvalue;
int maxObjectID;
 
[numthreads(8,8,1)]
void CS(uint3 tid : SV_DispatchThreadID)
{
    uint obj = ObjectIDTexture[tid.xy];
    float value = InputTexture[tid.xy];
    
    if (value > Minvalue && obj < maxObjectID)
    {
        uint oldValue;
        InterlockedAdd(RWObjectBuffer[obj],1,oldValue);
    }
}

Make sure you use InterlockedAdd, as you need atomic operation in that case.

Next we can filter elements:

Code Snippet

StructuredBuffer<uint> HitCountBuffer;
AppendStructuredBuffer<uint> AppendObjectIDBuffer : BACKBUFFER;
 
int minHitCount;
 
[numthreads(64,1,1)]
void CS(uint3 tid : SV_DispatchThreadID)
{
    uint c,stride;
    HitCountBuffer.GetDimensions(c,stride);
    if (tid.x >= c)
        return;
    
    int hitcount = HitCountBuffer[tid.x];
    if (hitcount >= minHitCount)
    {
        AppendObjectIDBuffer.Append(tid.x);
    }
}

This is that easy, of course instead of only rendering ObjectID in the map, we can easily add some extra metadata (triangle ID, closest vertexID) for easier lookup.

Now in order to perform multi user detection (for example, using Kinect2 body Index texture), process is not much different.

Instead of having a buffer of ObjectCount, we create it of ObjectCount*UserCount

Accumulator becomes:

Code Snippet

Texture2D<uint> ObjectIDTexture;
Texture2D<uint> UserIDTexture;
 
RWStructuredBuffer<uint> RWObjectBuffer : BACKBUFFER;
 
float Minvalue;
int maxObjectID;
int objectCount;
 
[numthreads(8,8,1)]
void CS(uint3 tid : SV_DispatchThreadID)
{
    uint obj = ObjectIDTexture[tid.xy];
    uint pid = UserIDTexture[tid.xy];
 
    if (pid != 255 < maxObjectID)
    {
        uint oldValue;
        InterlockedAdd(RWObjectBuffer[pid*objectCount+obj],1,oldValue);
    }
}

And filtering becomes:

Code Snippet

StructuredBuffer<uint> HitCountBuffer;
AppendStructuredBuffer<uint2> AppendObjectIDBuffer : BACKBUFFER;
 
int minHitCount;
int objectCount;
[numthreads(64,1,1)]
void CS(uint3 tid : SV_DispatchThreadID)
{
    uint c,stride;
    HitCountBuffer.GetDimensions(c,stride);
    if (tid.x >= c)
        return;
    
    int hitcount = HitCountBuffer[tid.x];
    if (hitcount >= minHitCount)
    {
        uint2 result;
        result.x = tid.x % objectCount; //objectid;
        result.y = tid.x / objectCount;
        AppendObjectIDBuffer.Append(result);
    }
}

We now have a tuple userid/object id instead, as shown in the following screenshot:

Please also note this technique can also easily be optimized with stencil, setting a bit per user. You get then limited to 8 users tho (7 users in case you also want to reserve one bit for object itself).

You will need one pass per user also (so 6 pass with proper depth stencil state/reference value).

If you lucky enough and can run on Windows10/DirectX11.3, and have a card that allows you, you can also simply do :

Code Snippet

Texture2D<uint> BodyIndexTexture : register(t0);
 
uint PS(float4 p : SV_Position) : SV_StencilRef
{
    uint id = BodyIndexTexture.Load(int3(p.xy, 0));
    if (id = 255) //No user magic value provided by Kinect2
        discard;
    return id;
}

Here is a simple stencil test rig, to show all of the intermediates:

That's it for part 2 (that was simple no?)

For the next (and last) part, I'll explain a few more advanced cases (triangle raycast, scene precull....)

So last month been working on latest commercial project (nothing involving any extreme creative skills), so back into research mode.

I got plenty of new ideas for rendering, and quite some parts of my engine are undergoing some reasonable cleanup (mostly new binding model to ease dx12 transition later on).

There's different areas in my tool that I'm keen on improving, many new parts will be for other blog posts, but one has lately drawn my attention and I really wanted to get this one sorted.

As many of you know (or don't), I've been working on many interactive installations around, from small to (very) large.

One common requirement for those are some form of Hit Detection, you have some input device (Kinect, Camera, Mouse, Touch, Leap....), and you need to know if you hit some object in your scene in order to have those elements to react.

After many years in the industry, I've been developing a lot of routines in that aspect, so I thought it would be nice to have all of that as a decent library (to just pick when needed).

After a bit of conversation with my top coder Eric, we wanted to do a bit of feature list, what do we expect of an intersection engine, then the following came up:

We have various scenarios, some routines are better fit to some use cases, so we don't want a "one mode to rule them all". For example, if our objects are near spherical, we don't want to ray cast mesh triangles, ray cast on bounding sphere is appropriate (and of course much faster).
We want our routines sandboxed, so 4v/flaretic subpatch, it should be one node with inputs/outputs, cooking done properly inside and optimized. That saves us load time, reduce compilation times for shaders (or allow precompiled), and easier to control workflow (if our routine is not needed it costs 0).
We want our library minimal, so actually hit routines should not even create data themselves, they are a better fit as pure behaviours (It also helps to have those routines working in different environments).
We don't want to be gpu only, if a case fits better as CPU, then we should use CPU (if preparing buffers costs more time than performing the test directly, then let's just do it directly in cpu).

Next we wanted to decide which type of outputs we needed, this came out:

bool/int flag, which indicates if object is hit or not
filtered version for hit objects
filtered version for non hit objects

Then here are the most important hit detection features we require (they cover a large part of our use cases in general)

Mouse/Pointer(s) to 2d shape (in most cases we want rectangle, circle).
Pointer(s) to 3d object (with selectable precision, either raycast bounding volume eg sphere/box, or go at triangle level).
Area(s) to shapes (rectangle selection)
Arbitrary texture to shape (most common scenario for this is infrared camera, or kinect body index texture). In that case we also want the ability to differenciate between user id as well as object id.
In any 3d scenario, we also eventually want either closest object or all objects that get from the test.
We also have 3 general cases in 3d : Intersect (ray), Containment (is our object inside another test primitive), Proximity (is our object "close enough" to some place).

So once those requirements are set, to perform hit detection we generally have the 2 main following scenarios, you use analytical function or you use a map.

So let's show some examples, if that first post about it, I'll only speak about analytical functions.

In this case, we generally follow the usual pattern, convert our input into the desired structure (point to ray for example), and every mode follow the current pseudo code (c# version)

Code Snippet

bool[] hitResults = new bool[objectCount];
List<int> hitObjectList = new List<int>();
 
for (int i = 0; i < objectCount; i++)
{
    var obj = testObjects[i];
 
    bool hr = Performtest(userObject, obj);
    hitResults[i] = hr;
 
    if (hr)
    {
        hitObjectList.Add(i);
    }
}

Pretty much all test modes will follow this pattern, only difference after is the test function.

Obviously when we start to reach a certain number of elements, this can become slow. And many times, our objects might be on our GPU, so we are not gonna load them back into CPU.

Translating this into hlsl is extremely straightforward, here is some pesudo code for it.

Code Snippet

bool PerformTest(SomeUserObject userInput, SomeStruct object)
{
    return //Perform your intersection/containment/proximity routine here
}
 
StructuredBuffer<SomeStruct> ObjectBuffers  : register(t0);
 
RWStructuredBuffer<uint> RWObjectHitResultBuffer : register(u0);
 
AppendStructuredBuffer<uint> AppendObjectHitBuffer : register(u1);
RWStructuredBuffer<uint> RWObjectHitBuffer : register(u1); //In this case, UAV should have a counter flag
 
cbuffer cbUserInput : register(b0)
{
    SomeStruct userInput;
};
 
cbuffer cbObjectData : register(b1)
{
    uint objectCount;
};
 
[numthreads(128, 1, 1)]
void CS(uint3 i : SV_DispatchThreadID)
{
    if (i.x >= objectCount)
        return;
 
    uint oid = i.x;
 
    SomeUserObject object = ObjectBuffers[oid];
 
    bool hitResult = PerformTest(userInput, object);
    RWObjectHitResultBuffer[oid] = hitResult;
    if (hitResult)
    {
        //If we use append buffer
        AppendObjectHitBuffer.Append(oid);
 
        //If we use counter buffer
        uint idx = RWObjectHitBuffer.IncrementCounter();
        RWObjectHitBuffer[idx] = oid;
    }
}

As you can see there's no huge difference into that.

It's pretty straightforward to perform ray to sphere/triangle/box as a starter.

Rectangle selection is also extremely simple:

Construct a 2d transformation for the screen area to check
Multiply inverse by camera projection
Build a frustrum from this
Perform a object/frustrum test insqtead of ray test.

here is a small range test example

Simple no? ;)

Now I can foresee 2 important question that our acute reader is probably already thinking of:

How do we get closest object?
What if we perfom several user inputs?

Of course, there are solutions for that.

Closest object.

First we will consider that our test function is also capable of returning distance.

So we modify our code by:

Code Snippet

struct HitResult
{
    uint objectID;
    float distanceToObject;
};
 
bool PerformTest(SomeUserObject userInput, SomeStruct object, out float distanceToObject)
{
    return //Perform your intersection/containment/proximity routine here
}
 
StructuredBuffer<SomeStruct> ObjectBuffers  : register(t0);
 
RWStructuredBuffer<uint> RWObjectHitResultBuffer : register(u0);
 
AppendStructuredBuffer<HitResult> AppendObjectHitBuffer : register(u1);
 
cbuffer cbUserInput : register(b0)
{
    SomeStruct userInput;
};
 
cbuffer cbObjectData : register(b1)
{
    uint objectCount;
};
 
[numthreads(128, 1, 1)]
void CS(uint3 i : SV_DispatchThreadID)
{
    if (i.x >= objectCount)
        return;
 
    uint oid = i.x;
 
    SomeUserObject object = ObjectBuffers[oid];
 
    float d;
    bool hitResult = PerformTest(userInput, object,  d);
    RWObjectHitResultBuffer[oid] = hitResult;
    if (hitResult)
    {
        HitResult hr;
        hr.objectID = oid;
        hr.distanceToObject = d;
        //If we use append buffer
        AppendObjectHitBuffer.Append(hr);
    }
}

Now our buffer also contains our distance to object, the only leftover is to grab the closest element.

We have 2 ways to work that out:

Use Compute shader (Use InterlockedMin to filter closest element, since distance is generally positive there's no float to uint tricks to apply), then perform another pass to check if element distance is equal to minimum.
Use Pipeline ; DepthBuffer is pretty good to keep closest element, so we might as well let him do it for us ;)

Using pipeline is extremely easy as well, process is as follow:

Create a 1x1 render target (uint), Associated with a 1x1 depth buffer

Prepare an indirect draw buffer (from the UAV counter), and draw as point list, write to pixel 0 in vertex, and pass distance so it's written to depth buffer, since code speaks more, here it is:

Code Snippet

struct HitResult
{
    uint objectID;
    float distanceToObject;
};
 
StructuredBuffer<HitResult> ObjectHitBuffer : register(u0);
 
cbuffer cbObjectData : register(b1)
{
    float invFarPlane;
};
 
void VS(uint iv: SV_vertexID, out float4 p : SV_Position,
    out float objDist : OBJECTDISTANCE,
    out uint objID : OBJECTID)
{    
    p = float4(0, 0, 0, 1); //We render to a 1x1 texture, position is always 0
    HitResult hr = ObjectHitBuffer[iv];
    
    objID = hr.objectID;
    //Make sure we go in 0-1 range
    objDist = hr.distanceToObject * invFarPlane; 
}
 
void PS(float4 p : SV_Position, float objDist : OBJECTDISTANCE, uint objID : OBJECTID,
    out uint closestObjID : SV_Target0, out float d : SV_Depth)
{
    //Just push object id
    closestObjID = objID;
    d = objDist; //Depth will preserve closest distance
}

Now our pixel contains our closest object (clear to 0xFFFFFFFF so this value will mean "no hit")

To finish for this first part, let's now add the fact that we have multiple "user Inputs".

We want to know the closest object per user.

This is not much more complicated (but of course will cost a test for each user/object).

Code Snippet

struct HitResult
{
    uint objectID;
    float distanceToObject;
};
 
bool PerformTest(UserInput userInput, SomeStruct object, out float distanceToObject)
{
    return //Perform your intersection/containment/proximity routine here
}
 
StructuredBuffer<SomeStruct> ObjectBuffers  : register(t0);
StructuredBuffer<UserInput> UserInputBuffer : register(t1);
 
RWStructuredBuffer<uint> RWObjectHitResultBuffer : register(u0);
 
RWStructuredBuffer<HitResult> RWObjectHitBuffer : register(u1); //Counter flag
RWStructuredBuffer<uint> RWObjectHitUserIDBuffer : register(u2);
 
cbuffer cbObjectData : register(b0)
{
    uint objectCount;
    uint userCount;
};
 
[numthreads(128, 1, 1)]
void CS(uint3 tid : SV_DispatchThreadID)
{
    if (tid.x >= objectCount)
        return;
 
    uint oid = tid.x;
 
    SomeUserObject object = ObjectBuffers[oid];
    uint hitCount = 0;
    for (uint i = 0; i < userCount; i++)
    {
        float d;
        bool hitResult = PerformTest(userInput, object, d);
 
        if (hitResult)
        {
            hitCount++;
            HitResult hr;
            hr.objectID = oid;
            hr.distanceToObject = d;
 
            uint idx = RWObjectHitBuffer.IncrementCounter();
            RWObjectHitBuffer[idx] = hr;
            RWObjectHitUserIDBuffer[idx] = i;
        }
    }
    RWObjectHitResultBuffer[oid] = hitCount;
}

Now we have a buffer with every hit from every user (here is a small example screenshot):

So instead of using a 1x1 texture, we use a Nx1 texture (where N is user Input count).

Process to get closest element is (almost) the same as per the single input.

Only difference, in Vertex Shader, route the objectID/Distance to the relevant user pixel, and you're set!

That's it for first part, next round, I'll explain how the "map technique works", stay tuned.

Catflier

Sunday 25 October 2015

Intersections Part2 : Id Maps

Intersections Part1