Currently it seems that audio gets merged in the output as a post process step. Refactor this so we can record video and audio in the same file. This is somewhat linked with the mkv muxer task.