A core part of building a Zoom bot is getting access to the audio and video streams from the call. The recommended way to do so is to use the Raw Data features from the Windows SDK.
A common question developers have when working with this feature is how to convert the Zoom raw video data into a more usable format, such as an MP4 video or a PNG image.
The the raw video data comes in from the Zoom API in YUV420p format. This is a raw, uncompressed video format which means that each frame is a large list of values representing each pixel in the final image.
How to Convert Zoom Raw Data into PNG
A quick and easy way to convert a YUV420p frame into a PNG image would be to use ffmpeg as follows.
The "-s" flag is specifying the dimensions of the input file, the "-pix_fmt yuv420p" flag is specifying the encoding as YUV420p format, and the "-pix_fmt rgb24" flag is setting the output pixel format to RGB24 which is required for the PNG output.
Because YUV420p format doesn't include any headers specifying the width, height, or even file type, when you're processing these frames you need to provide this metadata to the software you're using.
How to Convert Zoom Raw Data into MP4
To convert a set of YUV420p frames into a MP4 video, we can use FFMPEG again as follows. Assuming you have a set of YUV420p files saved from your Zoom bot in the current folder, with sequential file names like frame-1.yuv, frame-2.yuv, ...
There are a couple new parameters in this command. The "-c:v rawvideo" at the beginning is telling FFMPEG that your input stream consists of raw video frames.
The "pattern_type glob -i '*.yuv'" is indicating that FFMPEG should use all ".yuv" files in the current folder as input.
The "-framerate" flag is specifying the framerate of the output video, and this parameter is required because you're building a video from a set of still frames so FFMPEG needs to know how long each frame will be shown for.
The "-pattern_type glob" along with "-i '*.yuv'" means that you'll be ingesting every file ending in ".yuv" in order to generate the video.
Finally, the "-c:v libx264" means the video will be encoded in the H264 format, which is required for the MP4 file format.
How to Convert Zoom Raw Data in Real Time
Both these techniques are suitable for converting raw video from your Zoom bot asynchronously, with data you've captured to a file. If you're interested in converting or processing this data in real-time, you'd want to use a streaming video processing framework like GStreamer instead of FFMPEG.
An alternative to dealing with all of the raw video processing yourself is to use Recall.ai, which provides an easy API for meeting bots on Zoom, and every other platform as well.