Wednesday 16 July 2014

OpenCV, Compute and immutability

I know some of my friends use rather regularly the very nice OpenCV contribution from Elliot Woods.

Most times people use the Camera/Projector calibration tool. This works pretty well (could do with some ui improvemements), but most times I wanted to look at it I always end up with the same problem, you need to download the whole Image pack.

This is a great pack of course, but in that scenario, downloading a 500 megs bulk of dlls (which can also depend of version) is let's call "not ideal". Ok in our times with super fast internet you would think it's ok, but well, my hard drive doesn't like it (so I don't either). All that to call a single opencv function!

So just wrote a small P/Invoke dll (and use static library linking instead of dynamic), one 20 lines of dynamic plugin to call the function and here we go, 1 megabyte dll which doesn't need any other external. I like minimalism ;)

I remember I wanted to do this for a while, and considering the amount of (no) time it took I fell very embarrassed )



Now after this there's a (few) things I wanted to add/change in that tool.

First the point selection is nice in some cases, but not so nice in others. Basically current technique renders object space coordinates in a texture, then you just sample that texture.

If you have a model crammed with small polygons it's fine enough, but what I would simply like to do in general is just get closest vertex from a triangle raycast. Since I don't want to blow up 100k rays in cpu, and 3d model is already in GPU (as obviously we want to render it), let's do a little bit of compute shader ;)

So first let's load the model into a big fat buffer (to avoid subsets annoyance, simple prefix sum on indices), then we have the following data structures:

Code Snippet
  1. StructuredBuffer<float3> PositionBuffer : POSITIONBUFFER;
  2. StructuredBuffer<uint3> IndexBuffer : INDEXBUFFER;
  3.  
  4. AppendStructuredBuffer<float3> AppendVertexHitBuffer : APPENDVERTEXHITBUFFER;
  5.  
  6. float3 raypos : RAYPOSITION;
  7. float3 raydir : RAYDIRECTION;
  8.  
  9. int FaceCount : FACECOUNT;
  10.  
  11. float eps : EPSILON = 0.000001f;

Pretty simple, mouse position is converted back to ray, and we use append buffer to get potential candidates (since we might hit several triangles).

Now here is our ray shader (I omit the ray formula, which is the same as in http://www.geometrictools.com/ )

Code Snippet
  1. [numthreads(64,1,1)]
  2. void CS_RayTriangle(uint3 dtid : SV_DispatchThreadID)
  3. {
  4.     if (dtid.x >= FaceCount) { return; }
  5.     
  6.     uint3 face = IndexBuffer[dtid.x];
  7.     
  8.     float3 p1 = PositionBuffer[face.x];
  9.     float3 p2 = PositionBuffer[face.y];
  10.     float3 p3 = PositionBuffer[face.z];
  11.     
  12.     float3 diff = raypos - p1;
  13.     float3 e1 = p2 - p1;
  14.     float3 e2 = p3 - p1;
  15.     float3 n = normalize(cross(e1,e2));
  16.     
  17.     float DdN = dot(raydir,n);
  18.     float fsign;
  19.     
  20.     bool hit = true;
  21.  
  22.     //Do you rayhit
  23.     
  24.     if (hit)
  25.     {
  26.         AppendVertexHitBuffer.Append(p1);
  27.         AppendVertexHitBuffer.Append(p2);
  28.         AppendVertexHitBuffer.Append(p3);
  29.     }
  30. }

Now when we hit a triangle, we append 3 vertices as "candidates", we now need to find the one closest to us. We could readback filtered data (using CopyResourceRegion), and finish computation on CPU, but that's not fun, so let's continue ;)

First to think option is to sort the data, but that's expensive, we only want the closest element.

So first let's process all elements and write the closest distance into a single buffer:

Code Snippet
  1. [numthreads(64,1,1)]
  2. void CS_MinDistance(uint3 dtid : SV_DispatchThreadID)
  3. {
  4.     if (dtid.x >= VertexHitCountBuffer.Load(0)) { return; }
  5.     
  6.     float3 p = VertexHitBuffer[dtid.x];
  7.     
  8.     float d = distance(raypos,p);
  9.     uint dummy;    
  10.     InterlockedMin(RWMinDistanceBuffer[0],asuint(d),dummy);
  11. }

Now we need to filter closest element:

Code Snippet
  1. [numthreads(64,1,1)]
  2. void CS_StoreIndex(uint3 dtid : SV_DispatchThreadID)
  3. {
  4.     if (dtid.x >= VertexHitCountBuffer.Load(0)) { return; }
  5.     
  6.     float3 p = VertexHitBuffer[dtid.x];
  7.     float d = distance(raypos,p);
  8.     uint ud = asuint(d);
  9.     
  10.     uint mind = MinDistanceBuffer[0];    
  11.     InterlockedCompareStore(RWMinElementBuffer[0], mind,ud);
  12. }

Please note that we don't store position directly since interlocked operations are only allowed on int/uint type. Also we don't handle case if we have more than one candidate, this is easy to replace Store by append (but anyway at some point we need to decide which point we select).

And just get position:

Code Snippet
  1. [numthreads(1,1,1)]
  2. void CS_ExtractPosition(uint3 dtid : SV_DispatchThreadID)
  3. {
  4.     uint idx = RWMinElementBuffer[0];
  5.     RWPositionBuffer[0] = VertexHitBuffer[idx];
  6. }

Copy those 12 bytes back in your CPU and you have your closest vertex.

One part of the morning well spent )

Now one feature which is always useful for a good editor (since at the end we edit points), if of course some form of undo/redo.

You have three main ways to implement undo:

  • For each action, use one function to update your model and one function to revert it. THis can be really cumbersome and error prone.
  • Serialize the state, and on undo create a new (or part modified) state from serialized data
  • Use immutable state
I have much growing interest into using more immutable in general, this is safer and i like the concept around it (ok it doesn't map well everywhere and can consume memory), but in that case (something like 10 points and couple projectors data), this sounds like a good use case.

So here is a calibration point:

Code Snippet
  1. public class CalibrationPoint
  2. {
  3.     private readonly Vector2 screenPosition;
  4.     private readonly Vector3 objectPosition;
  5.  
  6.     public CalibrationPoint(Vector2 screenPosition, Vector3 objectPosition)
  7.     {
  8.         this.screenPosition = screenPosition;
  9.         this.objectPosition = objectPosition;
  10.     }
  11.  
  12.     public Vector2 ScreenPosition
  13.     {
  14.         get { return this.screenPosition; }
  15.     }
  16.  
  17.     public Vector3 ObjectPosition
  18.     {
  19.         get { return this.objectPosition; }
  20.     }
  21. }

You can see that once we create our point, we can't change properties anymore.

No to update properties, instead of setting data directly, we return a new point. There's a little way to help memory, if property is the same we return the same instance:

Code Snippet
  1. public CalibrationPoint SetScreenPosition(Vector2 screenPosition)
  2. {
  3.     return this.screenPosition == screenPosition ? this : new CalibrationPoint(screenPosition, this.objectPosition);
  4. }
  5.  
  6. public CalibrationPoint SetObjectPosition(Vector3 objectPosition)
  7. {
  8.     return this.objectPosition == objectPosition ? this : new CalibrationPoint(this.screenPosition, objectPosition);
  9. }
  10.  
  11. public CalibrationPoint Set(Vector2 screenPosition, Vector3 objectPosition)
  12. {
  13.     return this.objectPosition == objectPosition &&
  14.         this.screenPosition == screenPosition ? this : new CalibrationPoint(screenPosition, objectPosition);
  15. }

Now do the same for Projector and calibration data:

Code Snippet
  1. public class Projector
  2. {
  3.     private readonly string name;
  4.     private readonly IEnumerable<CalibrationPoint> points;
  5.  
  6.     public Projector(string name, IEnumerable<CalibrationPoint> points)
  7.     {
  8.         if (name == null)
  9.         {
  10.             throw new ArgumentNullException("name");
  11.         }
  12.         if (points == null)
  13.         {
  14.             throw new ArgumentNullException("points");
  15.         }
  16.         this.name = name;
  17.         this.points = points;
  18.     }
  19.  
  20.     public string Name
  21.     {
  22.         get { return this.name; }
  23.     }
  24.  
  25.     public IEnumerable<CalibrationPoint> Points
  26.     {
  27.         get { return this.points; }
  28.     }
  29. }

Some of the functions to modify (create new) state:

Code Snippet
  1. public Projector AddPoint(Vector2 screenPosition, Vector3 objectPosition)
  2. {
  3.     var point = new CalibrationPoint(screenPosition, objectPosition);
  4.  
  5.     return new Projector(this.name, this.points.Concat(new CalibrationPoint[] { point }));
  6. }
  7.  
  8. public Projector RemovePoint(CalibrationPoint point)
  9. {
  10.     return new Projector(this.name, this.points.Where(p => p != point));
  11. }

Calibration class:

Code Snippet
  1. public class Calibration
  2. {
  3.     private readonly IEnumerable<Projector> projectors;
  4.     private readonly CalibrationSettings settings;
  5.  
  6.     public Calibration(CalibrationSettings settings, IEnumerable<Projector> projectors)
  7.     {
  8.         if (settings == null)
  9.         {
  10.             throw new ArgumentNullException("settings");
  11.         }
  12.         if (projectors == null)
  13.         {
  14.             throw new ArgumentNullException("projectors");
  15.         }
  16.         this.settings = settings;
  17.         this.projectors = projectors;
  18.     }
  19. }

And to update projector data:

Code Snippet
  1. public Calibration UpdateProjector(Projector oldProjector, Projector newProjector)
  2. {
  3.     if (oldProjector == newProjector)
  4.     {
  5.         return this;
  6.     }
  7.     else
  8.     {
  9.         var projs = this.projectors.ToList();
  10.         int idx = projs.IndexOf(oldProjector);
  11.  
  12.         if (idx >= 0)
  13.         {
  14.             projs[idx] = newProjector;
  15.             return new Calibration(this.settings, projs);
  16.         }
  17.         else
  18.         {
  19.             throw new ArgumentException("oldProjector", "This projector is not part of this calibration data");
  20.         }
  21.     }
  22. }

I could do argument check first of course, and you can use some "Builder classes" to maintain those updates, but you get the point.

Now someone would say, this is a lot of work for simple classes....

But now once you are done with this (not so bad) boilerplate, here is out undo stack:

Code Snippet
  1. public class CalibrationUndoStack
  2. {
  3.     private readonly Stack<Calibration> undoStack;
  4.  
  5.     public CalibrationUndoStack(Calibration initial)
  6.     {
  7.         this.undoStack = new Stack<Calibration>();
  8.         this.undoStack.Push(initial);
  9.     }
  10.  
  11.     public void Apply(Func<Calibration, Calibration> commandFunc)
  12.     {
  13.         var newState = commandFunc(this.Current);
  14.         if (newState != this.Current)
  15.         {
  16.             this.undoStack.Push(newState);
  17.         }
  18.     }
  19.  
  20.     public Calibration Current
  21.     {
  22.         get { return this.undoStack.Peek(); }
  23.     }
  24.  
  25.     public Calibration Undo()
  26.     {
  27.         return this.CanUndo ? this.undoStack.Pop() : this.undoStack.Peek();
  28.     }
  29.  
  30.     public bool CanUndo
  31.     {
  32.         get { return this.undoStack.Count > 1; }
  33.     }
  34. }

As you can see, since we always return a new state, we pass a lambda to the stack, and if object has been modified (eg: function returns a new state), then we push our new state. That's how easy that is.

To implement update commands becomes as trivial as :

Code Snippet
  1. public static Calibration AddProjector(Calibration c, string name)
  2. {
  3.     return c.AddProjector(name);
  4. }
  5.  
  6. public static Calibration RenameProjector(Calibration state, Projector projector, string newname)
  7. {
  8.     var p = projector.SetName(newname);
  9.     return state.UpdateProjector(projector, p);
  10. }
  11.  
  12. public static Calibration AddPoint(Calibration state, Projector projector, Vector2 screen, Vector3 obj)
  13. {
  14.     var p = projector.AddPoint(screen, obj);
  15.     return state.UpdateProjector(projector, p);
  16. }
  17.  
  18. public static Calibration SetScreenPoint(Calibration state, Projector projector, CalibrationPoint point, Vector2 screen)
  19. {
  20.     var newPoint = point.SetScreenPosition(screen);
  21.     var newProjector = projector.UpdatePoint(point, newPoint);
  22.     return state.UpdateProjector(projector, newProjector);
  23.  
  24. }

And as you notice, this looks pretty verbose, here is how to do the same in f# (I love type inference, amongst may other things)

Code Snippet
  1. module CalibrationCommandsFS =
  2.  
  3.     let addprojector (c:Calibration,n) = c.AddProjector(n);
  4.     
  5.     let renameprojector(c:Calibration,p, n) = c.UpdateProjector(p,p.SetName(n))
  6.  
  7.     let addpoint(c:Calibration,p,s,o) = c.UpdateProjector(p,p.AddPoint(s,o))
  8.  
  9.     let setscreenpoint(c:Calibration,proj,pt,s) = c.UpdateProjector(proj,proj.UpdatePoint(pt,pt.SetScreenPosition(s)))

And to operate on calibration:

Code Snippet
  1. let x = new CalibrationUndoStack()
  2.  
  3. let s = new CalibrationSettings(Matrix.Identity)
  4. let empty  = []
  5. let c = new Calibration(s, [])
  6.  
  7. x.Apply(fun c -> addprojector(c,"hello"))

That's it for now, but likely more f# soon ;)

No comments:

Post a Comment