Well, GSoC came to an end and I thought I’d post an update.🙂
As a quick refresher, my project involved adding video decode support to the open source engine CrystalSpace. Everything turned out ok, so I’m gonna talk a bit about it.
I added video decode support through Theora, and open source video compression format. The basic idea behind my implementation was to have the main thread to render the scene, and another thread to decode video frames and convert data from the YCbCr colorspace to RGB. Then, in the main thread, data would be written to one of the buffers, and the buffers would be swapped. I used double buffering.
If I had to give a few tips on video decoding, here’s what I’d say:
If you can, avoid them. Threads introduce concurrency and code that ran perfectly before, will probably run slower and you won’t know why. If you can get just the right amount of processing transferred over to another thread, you’ll get a speed boost, but you’ll have to experiment quite a bit. In my project, I used threads, because you can’t just use up the main thread (which renders and updates the scene) to do your dirty work. It worked well in the end, but it was a bit of a pain in the ass. Here’s a nice article on threads.
Depending on the colorspace supported by the surface you render the video frames to, you will have to do conversion. I did it on the CPU using Look-up Tables, and it worked well, but just for the 4:2:0 pixel format. 4:2:2 and 4:4:4 were too slow to be useable. My advice, if you’re doing conversion on the CPU, is to use MMX. I ran some tests on MMX during my project, and it turned out about 2 times faster. Implementing conversion in MMX using LUTs isn’t really a hard thing, but you’ll have to watch out for different compilers. asm tags are usually different for each one. Unfortunately, I din’t have time to move conversion to MMX. In any case, here’s a paper about optimizing YUV->RGB conversion. Alternatively, you can do conversion on the GPU, using a shader. Again, I didn’t have time to implement this, but here’s a reference.
The main problem I ran into during this project is that openGL doesn’t support multi threading (as far as I know, oGL 3 and up do support it). This is a big problem. You can’t access an openGL context from outside the thread it was created in. This is annoying because, if you’re processing an openGL resource on another thread, you need to place a callback in the main thread to apply the changes, which slows you down. Admittedly, the slow down isn’t that big, but you’ll see a huge difference between single threaded implementations and multi threaded ones. For example, w/o multi threading, converting th 4:4:4 pixel format (the one used for 720p vids) was pretty fast. As soon as I moved conversion to another thread, due to the synchronization induced by this, it was now too slow to be practical.
In any case, the project turned out OK, and I’m happy with it. Learned a lot of things this summer. It was a really nice experience.🙂
This also means I can get back to game dev😀 Gonna post a prototype as soon as I have one😀