Tuesday 28 October 2014

Hap Attack (Part 2)

In previous post I explained a bit how to decode Hap files.

I explained a bit how the QuickTime format works, so let's show a bit of code.

First we need to access a leaf node to extract information, for this let's build a small interface:

Code Snippet
  1. public interface ILeafAtomReader
  2. {
  3.     void Read(FileStream ds);
  4. }

Now let's show an example implementation:

Code Snippet
  1. public class ChunkOffsetReader : ILeafAtomReader
  2. {
  3.     private List<uint> chunkOffsetTable = new List<uint>();
  4.  
  5.     public List<uint> Table
  6.     {
  7.         get { return this.chunkOffsetTable; }
  8.     }
  9.  
  10.     public void Read(FileStream ds)
  11.     {
  12.         //Bypass header
  13.         ds.Seek(4, SeekOrigin.Current);
  14.  
  15.         uint entrycount = ds.ReadSize();
  16.  
  17.         for (uint i = 0; i < entrycount; i++)
  18.         {
  19.             uint size = ds.ReadSize();
  20.             chunkOffsetTable.Add(size);
  21.         }
  22.     }
  23. }

This is reasonably simple, we go parse the data we require.

Now there is a little problem, some parsers will read the whole atom, some parsers might only read the data they want, so our file position pointer might not be at the end of the atom.

To circumvent that, let's add a little adapter:

Code Snippet
  1. public class LeafAtomReaderAdapter : ILeafAtomReader
  2. {
  3.     private readonly ILeafAtomReader reader;
  4.  
  5.     public LeafAtomReaderAdapter(ILeafAtomReader reader)
  6.     {
  7.         if (reader == null)
  8.             throw new ArgumentNullException("reader");
  9.  
  10.         this.reader = reader;
  11.     }
  12.  
  13.     public void Read(FileStream ds)
  14.     {
  15.         var currentpos = ds.Position;
  16.         reader.Read(ds);
  17.         ds.Seek(currentpos, SeekOrigin.Begin);
  18.     }
  19. }


This takes another atom reader, but before to let it read, it stores the file position pointer, and restores it once the other reader is done.

Since Atom order is not guaranteed, we also need to tell which containers we are interested in:

Code Snippet
  1. private string[] containers = new string[]
  2. {
  3.     "moov","trak","mdia","minf","stbl"
  4. };


Then once we find the right media, sample table (the one which contains hap), we need to lookup a bit of extra information, so we need to store moov and trak atom offset (so we can then read tkhd to get video size info, and mvhd to get time units).

Code Snippet
  1. if (containers.Contains(fcc.ToString()))
  2. {
  3.     if (fcc.ToString() == "trak")
  4.     {
  5.         this.currenttrakoffset = ds.Position;
  6.     }
  7.     if (fcc.ToString() == "moov")
  8.     {
  9.         this.currentmoovoffset = ds.Position;
  10.     }
  11.  
  12.     //Keep parent position, since we'll want to get this to read sample table
  13.     Parse(ds, ds.Position);
  14. }


Once we found a track with hap, we can jump back to the file position and go read headers.

So now we can finally play hap files.
Only issue, without ssd, this is drive intensive, and we generally have a lot of memory, so let's allow to load the whole video data in ram.

This is done differently in QT and Avi.

In QT I already built the lookup table, so I can just load a copy of the file in memory, and lookup from there:

Code Snippet
  1. public unsafe static DataStream ReadFile(string path, CancellationToken token, IProgress<double> progress, int chunkSize = 1024)
  2. {
  3.     var fs = File.OpenRead(path);
  4.  
  5.     IntPtr dataPointer = Marshal.AllocHGlobal((int)fs.Length);
  6.     IntPtr pointerOffset = dataPointer;
  7.  
  8.     byte[] chunk = new byte[chunkSize];
  9.     int remaining = Convert.ToInt32(fs.Length - fs.Position);
  10.     int read = 0;
  11.  
  12.     while (remaining > 0)
  13.     {
  14.         int toread = Math.Min(remaining, chunkSize);
  15.  
  16.         fs.Read(chunk, 0, toread);
  17.         Marshal.Copy(chunk, 0, pointerOffset, toread);
  18.  
  19.         pointerOffset += toread;
  20.         read += toread;
  21.  
  22.         double p = (double)read / (double)fs.Length;
  23.         progress.Report(p);
  24.  
  25.         remaining = Convert.ToInt32(fs.Length - fs.Position);
  26.  
  27.         if (token.IsCancellationRequested)
  28.         {
  29.             fs.Close();
  30.             Marshal.FreeHGlobal(dataPointer);
  31.             throw new OperationCanceledException();
  32.         }
  33.     }
  34.  
  35.     var ds = new DataStream(dataPointer, fs.Length, true, false);
  36.     fs.Close();
  37.  
  38.     return ds;
  39. }


This is just a simple file reader, that grabs blocks and report progress, so it can be sent as a background task.

For Avi I got no lookup table, but some api to get frameindex -> data (from disk). So I create a memory block large enough to contain the whole video (file size works perfectly for that purpose ;)

Then In background I go request frames and build a prefix sum:

Code Snippet
  1. public unsafe static AviOffsetTable BuildTable(hapFileVFW fileinfo, CancellationToken token, IProgress<double> progress)
  2. {
  3.     long fileLength = new FileInfo(fileinfo.Path).Length;
  4.     int frameCount = fileinfo.FrameCount;
  5.  
  6.     IntPtr dataPointer = Marshal.AllocHGlobal((int)fileLength);
  7.     IntPtr offsetPointer = dataPointer;
  8.  
  9.     List<OffsetTable> offsetTable = new List<OffsetTable>();
  10.  
  11.     int readBytes = 0;
  12.     int currentOffset = 0;
  13.     for (int i = 0; i < frameCount; i++)
  14.     {
  15.         fileinfo.WriteFrame(i, offsetPointer, out readBytes);
  16.  
  17.         OffsetTable t = new OffsetTable()
  18.         {
  19.             Length = readBytes,
  20.             Offset = currentOffset
  21.         };
  22.  
  23.         offsetTable.Add(t);
  24.  
  25.         offsetPointer += readBytes;
  26.         currentOffset += readBytes;
  27.  
  28.         double prog = (double)i / (double)frameCount;
  29.         progress.Report(prog);
  30.  
  31.         if (token.IsCancellationRequested)
  32.         {
  33.             Marshal.FreeHGlobal(dataPointer);
  34.             throw new OperationCanceledException();
  35.         }
  36.     }
  37.     progress.Report(1.0);
  38.     return new AviOffsetTable(offsetTable, dataPointer);
  39. }


This is simple too, we just ask the avi wrapper to write into our pointer, get number of bytes written and move pointer by that offset for next frame. At the same time we build our offset table.

Once we have our data loaded in memory everything is much simpler :

Code Snippet
  1. public IntPtr ReadFrame(int frameIndex, IntPtr buffer)
  2. {
  3.     if (this.memoryLoader != null && this.memoryLoader.Complete)
  4.     {
  5.         var tbl = this.memoryLoader.DataStream;
  6.         IntPtr dataPointer = tbl.DataPointer;
  7.         var poslength = tbl.Table[frameIndex];
  8.         dataPointer += (int)poslength.Offset;
  9.         return dataPointer;
  10.     }
  11.     else
  12.     {
  13.         int readBytes = 0;
  14.         int readSamples = 0;
  15.         Avi.AVIStreamRead(this.VideoStream, frameIndex, 1, buffer, this.frameSize.Width * this.frameSize.Height*6, ref readBytes, ref readSamples);
  16.         return buffer;
  17.     }
  18. }

in first case we just return a pointer from our lookup table (no memory copy required), in the second case we read from disk.

Preloading content into memory gives a huge performance gain (and memory is rather cheap, easy to have 64 gigs in a single machine, so preload can be a definite good option).

So after that comes all the usual cleanup, manage videos element count and make sure we don't have memory leaks / crashes.

Now I have a really nicely working player, why limit our imagination?

First, I wanted to test some 8k encoding, so I exported a few frames from 4v and tried to use virtualdub for encode in hap. Press Save->Out of memory.

So instead, let's just encode directly from vvvv ;)

Writing encoder was easy, you set avi headers with you video size/framerate/compression, then you only need to get texture from gpu, convert in whichever dxt/bc format you want, compress with snappy if required, write frame.



One thing well done!

Next, since we run on dx11 hardware, we have access to new block compression formats:

  • BC6: three channels half floating point (hdr playback, mmmmmhhhh)
  • BC7: 4 channels, better quality than BC3/DXT5, but encoding is really slow
So let's add a few more FourCC, and add option in encoder/decoder:


Now we have new hap Formats:

Code Snippet
  1. public enum hapFormat
  2. {
  3.     RGB_DXT1_None = 0xAB,
  4.     RGB_DXT1_Snappy = 0xBB,
  5.     RGBA_DXT5_None = 0xAE,
  6.     RGBA_DXT5_Snappy = 0xBE,
  7.     YCoCg_DXT5_None = 0xAF,
  8.     YCoCg_DXT5_Snappy = 0xBF,
  9.     RGB_BC6S_None = 0xA3,
  10.     RGB_BC6S_Snappy = 0xB3,
  11.     RGB_BC6U_None = 0xA4,
  12.     RGB_BC6U_Snappy = 0xB4,
  13.     RGBA_BC7_None = 0xA7,
  14.     RGBA_BC7_Snappy = 0xB7,
  15. }

Please note that those formats are also available out of the box in OpenGL (btpc compression).
So any software that use GL3.1 + can take advantage of it (and really softwares should already have moved to a GL4+ core profile, so there are NO excuses ;)


Finally, people always tend to think of videos as just a sequence of images.

Although there are some cases where other formats are more suitable (panoramic/dome projection).

In that case cubemaps are much more suited for this.

Oh and DXT/BC formats support cubemap compression.

So let's just write cubemap data as a frame, which was 0 lines of code in my case, since my writer already supports cubemap export.

Then there's only a little twist, in the avi stream info, don't forget to multiply data required by 6 (yes we now have 6 textures in one frame)

in AVISTREAMINFO :
public Int32    dwSuggestedBufferSize;

Is the field where we initiate buffers.

Then decoding frames works exactly like standard textures (Cubemaps are Texture2D too, so loading is done exactly the same way).

There's of course a little twist, in case of CubeTexture we need to set different parameters on ShaderResourceView creation:

Code Snippet
  1. if (videoTexture.Description.OptionFlags.HasFlag(ResourceOptionFlags.TextureCube))
  2. {
  3.     ShaderResourceView videoView = new ShaderResourceView(device.Device, videoTexture);
  4. }
  5. else
  6. {
  7.     ShaderResourceViewDescription srvd = new ShaderResourceViewDescription()
  8.     {
  9.         ArraySize = 6,
  10.         FirstArraySlice = 0,
  11.         Dimension = ShaderResourceViewDimension.TextureCube,
  12.         Format = videoTexture.Description.Format,
  13.         MipLevels = videoTexture.Description.MipLevels,
  14.         MostDetailedMip = 0,
  15.     };
  16.  
  17.     ShaderResourceView videoView = new ShaderResourceView(device.Device, videoTexture,srvd);
  18. }

That's more or less it, cube texture encoding/playback with Hap:




Some days well spent!!



No comments:

Post a Comment