Zoom provides a platform for video conferencing, but there are often scenarios where we need to access the transcripts of a recorded meeting. This could be useful for various purposes, such as sharing meeting notes over email, translating the meeting content into multiple languages, or analyzing and summarizing the discussion using advanced language models.

Fortunately, Zoom offers APIs that allow developers to retrieve transcripts for the recorded meetings, albeit with some limitations. However, the Zoom official documentation does not provide a detailed guide for using these APIs, or the caveats they come with.

In this article, we will explore the process of accessing these transcripts via Zoom's Cloud Recording API. We will cover the necessary setup, including creating a Zoom App and obtaining the required credentials, as well as providing a Python code example that demonstrates how to interact with the Zoom API to retrieve the meeting transcripts. We will also look into its limitations, some workarounds that can be employed, and quickly touch on a few alternatives.

Note that while we are using Python here, this can be easily translated into other languages with ease. At the heart of it, all we are doing is calling Zoom APIs. If you want to build a Zoom bot yourself, you can either start from scratch with our blog on how to build a zoom bot or check out our Zoom transcript API which works in just a few lines of code (tutorial on how to get transcripts from Zoom).

Overview of steps

We will simply do 3 things:

Create a Zoom App to allow access to some of their APIs
Create and join a meeting with cloud recording enabled
Programmatically fetch the audio transcripts using Python

Brief on Zoom Apps and OAuth2

Zoom employs the OAuth 2.0 protocol for authorizing access to its APIs. OAuth is an industry-standard authorization mechanism that allows applications to securely access resources on behalf of a user or service account without exposing sensitive credentials like passwords.

To access the Zoom APIs, you need to create a Server-to-Server OAuth app within the Zoom App Marketplace. This type of application is suitable for machine-to-machine communication, which is precisely what we need in this case.

Pre-requisites

Zoom Paid Plan
Python3 >= 3.3 installed in your system.
If you are not the account admin, then your account should have:
Permissions to view and edit Server-to-Server OAuth apps
Permissions for scopes that you will add to the app . We will detail the scopes later in the article.

Create a Server-to-Server OAuth App

Follow these steps:

1. Let's go to the Zoom App Marketplace. Click on Develop and then click on Server-to-Server App.

Step 1

2. Give a name to your app. In this example, we've called it MyTranscriptionApp. You can give it your own name. Click on Create.

Step 2

3. You should now be in the App Credentials section: Notice your Account ID, Client ID, and Client Secret in the App Credentials. These client values are specific to your app. When we make Zoom API calls later, we will use these.

Step 3

4. Provide details in the Information section.

Step 4

5. Next jump to Scopes. Here, we will be defining the scopes to which this app's tokens have access to. This in turn defines which APIs can be called using the token.

In this tutorial, we'll be using the following endpoints:

That means we'll need the following scopes, which Zoom mentions in the documentation for those endpoints:

cloud_recording:read:list_user_recordings:admin
cloud_recording:read:list_recording_files:admin

Step 5

6. Next, go to the Activation section and click Activate your app. Your app won't provide the access_tokens without this step.

Before Activation:

Step 6A

After Activation:

Step 6B

Start and complete a meeting

Start a Zoom meeting. Ensure that you have enabled Cloud recording by expanding Options, checking the option for Automatically record meeting, and selecting the In the cloud radio button. Step 7

Join the meeting alone, or add more people. Talk for a couple of seconds during the meeting, and then end the meeting.

It will take a few minutes for the recording to show up as Zoom needs to process it. For this article, we'll monitor the Recordings Tab for the recording.

You can also use the recording.transcript_completed webhook event to be notified when the recording transcript is available.

Write code to fetch transcripts

Now that we've set up our Server-to-Server OAuth App and completed a meeting, let's write code to fetch details from the Zoom API.

In this section, we'll create a Zoom SDK Python Client that can interact with the Zoom API to fetch transcripts.

Create a Zoom SDK Client

We'll create a simple Zoom SDK Client.

$ mkdir my-zoom-sdk-client
$ cd my-zoom-sdk-client
$ python3 -m venv venv
$ source venv/bin/activate
$ echo "requests" >> requirements.txt
$ pip install -r requirements.txt

Create a file called main.py and copy the following code:

"""
MyZoomSDKClient
"""
import requests


class MyZoomSDKClient:

    # Initialize the client with account_id, client_id, and client_secret
    def __init__(self, client_id, client_secret, account_id):
        self.client_id = client_id
        self.client_secret = client_secret
        self.account_id = account_id
        self.access_token = None

    # Generate a new access token only if one does not exist, or if the user explicitly asks to generate 
    # (using 'force=True' means generating a new token)
    def get_access_token(self, force=False):
        if self.access_token and not force:
            return self.access_token

        access_token_uri = "https://zoom.us/oauth/token"
        data = {
            "grant_type": "account_credentials",
            "account_id": self.account_id,
            "client_id": self.client_id,
            "client_secret": self.client_secret
        }
        response = self._make_request(access_token_uri, data=data, method='POST')
        return response.json()["access_token"]

    def _make_request(self, uri, headers=None, method='GET', data=None):
        try:
            if method == 'POST':
                response = requests.post(uri, headers=headers, data=data)
            else:
                response = requests.get(uri, headers=headers)

            if response.status_code // 100 != 2:
                raise Exception('Invalid response from Zoom API. Please try again later.')
            return response
        except Exception as e:
            raise Exception('Invalid response from Zoom API. Please try again later. Error Message: ' + str(e))

__init__ simply initializes our client with the account_id, client_id and client_secret.(Remember we saw these earlier in Step 3 in the Create a Server-to-Server OAuth App section?)
_make_request is just a utility function that allows us to call any REST APIs.
get_access_token generates a new access token by calling the Zoom OAuth API. force is a simple maneuver to ensure a new access_token is generated instead of reusing the earlier access_token. access_tokens expire after a certain time.

Let's see if all works well.

$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> access_token = client.get_access_token()
>>> print(access_token)
>>> exit()

List all meetings

We'll modify the code to access the List Recordings API.

GET /users/{userId}/recordings

Update main.py to call List Meetings and return parsed data. Use the following code:

#...
# Generate authorization header using access_token
def get_authorization_header(self):
    return { "Authorization": f"Bearer {self.get_access_token()}" }

# List id and topic of all my meetings
def list_my_meetings(self):
    list_meetings_uri = "https://api.zoom.us/v2/users/me/recordings"
    headers = self.get_authorization_header()

    response = self._make_request(list_meetings_uri, headers=headers)
    return [(x['id'], x['topic']) for x in response.json()['meetings']]
#...

Let's see it in action:

$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> meetings = client.list_my_meetings()
>>> print(meetings)
>>> exit()

Get Meeting Details

Let's modify the code to access the Get Recording API.

GET "/meetings/{meetingId}/recordings"

Update main.py to add the following function:

#...
    # Get meeting details by ID
    def get_meeting_details(self, meeting_id):
        if meeting_id in self.meetings:
            return self.meetings[meeting_id]

        meeting_details_url = f"https://api.zoom.us/v2/meetings/{meeting_id}/recordings"
        headers = self.get_authorization_header()

        self.meetings[meeting_id] = self._make_request(meeting_details_url, headers=headers).json()
        return self.meetings[meeting_id]
#...

Also, add this new property to the client:

class MyZoomSDKClient:

    # Initialize the client with account_id, client_id, and client_secret
    def __init__(self, client_id, client_secret, account_id):
        # ...
        self.meetings = {}

Let's run this code to get our meeting details:

$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> meetings = client.list_my_meetings()
>>> my_meeting = client.get_meeting_details(meetings[0][0])
>>> print(my_meeting)
>>> exit()

Get Transcriptions

Finally, let's add the following code to fetch the transcripts. Transcripts are available as part of the Get Recording API. The API response contains a recording_files property which contains all the download_urls for video recordings, audio recordings, transcripts, and more. We can check the recording_type to find the type of recording. We are interested in the audio_transcript as recording_type.

Modify main.py to add the following function:

#...
    # Get audio transcript for a given meeting
    def get_audio_transcript(self, meeting_id):
        meeting_details = self.get_meeting_details(meeting_id)

        aud_ts_uri = [i['download_url'] for i in meeting_details['recording_files'] if i['recording_type']=='audio_transcript'][0]
        headers = self.get_authorization_header()

        response = self._make_request(aud_ts_uri, headers=headers)
        # response.content contains audio transcript in bytes, so we need to decode it to understand the content.
        decoded_content = response.content.decode("utf-8")
        final_formatted_content = decoded_content.strip().split('\r\n\r') # format content

        # Print the audio transcript
        for line in final_formatted_content: print(line)
#...

Let's run the following to fetch the transcripts:

$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> meetings = client.list_my_meetings()
>>> client.get_audio_transcript(meetings[0][0])

This is an example output. Your output should be in a similar format:

...
WEBVTT

1
00:00:16.239 --> 00:00:27.079
John: Hi, this is a test for audio transcripts.

2
00:00:28.079 --> 00:00:30.239
Wewake: I am going to say.

3
00:00:30.869 --> 00:00:32.329
Wewake: a few words.

4
00:00:32.629 --> 00:00:41.649
Wewake: I expect this to be recorded and available to me later.

5
00:00:44.779 --> 00:00:46.509
Wewake: That is it.

6
00:00:52.219 --> 00:00:54.909
Wewake: So simple.

7
00:00:56.319 --> 00:01:00.269
Wewake: We are done.

8
00:01:00.749 --> 00:01:01.629
Wewake: Bye.

Bonus: Closed Caption

We can also retrieve the closed captions from the recording_files. For this, we would need to enable closed caption during the meeting. Here is the code to fetch it:

#...
    # Get closed captions
    def get_closed_captions(self, meeting_id):
        meeting_details = self.get_meeting_details(meeting_id)

        cc_uri = [i['download_url'] for i in meeting_details['recording_files'] if i['recording_type']=='closed_caption'][0]
        headers = self.get_authorization_header()

        response = self._make_request(cc_uri, headers=headers)

        # response.content contains closed caption in bytes, so we need to decode it to understand the content.
        decoded_content = response.content.decode("utf-8")
        final_formatted_content = decoded_content.strip().split('\r\n\r') # format content

        # Print the closed captions
        for line in final_formatted_content: print(line)
#...

Run it:

$ python3
>>> CLIENT_ID='' # Put your Client ID
>>> CLIENT_SECRET='' # Put your CLIENT_SECRET
>>> ACCOUNT_ID='' # Put your ACCOUNT_ID
>>> from main import MyZoomSDKClient
>>> client = MyZoomSDKClient(CLIENT_ID, CLIENT_SECRET, ACCOUNT_ID)
>>> meetings = client.list_my_meetings()
>>> client.get_closed_captions(meetings[0][0])

Complete Code for the Zoom Client

"""
MyZoomSDKClient

Simple Zoom SDK to get access_token, recordings, transcripts, and closed captions.
for a Zoom meeting (with cloud recording). Requires client_id, client_secret and account_id
"""
import requests


class MyZoomSDKClient:

    # Initialize the client with account_id, client_id, and client_secret
    def __init__(self, client_id, client_secret, account_id):
        self.client_id = client_id
        self.client_secret = client_secret
        self.account_id = account_id
        self.access_token = None
        self.meetings = {}

    # Generate a new access token only if one does not exist, or if the user explicitly asks to generate 
    # (using 'force=True' means generating a new token)
    def get_access_token(self, force=False):
        if self.access_token and not force:
            return self.access_token

        access_token_uri = "https://zoom.us/oauth/token"
        data = {
            "grant_type": "account_credentials",
            "account_id": self.account_id,
            "client_id": self.client_id,
            "client_secret": self.client_secret
        }
        response = self._make_request(access_token_uri, data=data, method='POST')
        return response.json()["access_token"]

    # Generate authorization header using access_token
    def get_authorization_header(self):
        return { "Authorization": f"Bearer {self.get_access_token()}" }

    # List id and topic of all my meetings
    def list_my_meetings(self):
        list_meetings_uri = "https://api.zoom.us/v2/users/me/recordings"
        headers = self.get_authorization_header()

        response = self._make_request(list_meetings_uri, headers=headers)
        return [(x['id'], x['topic']) for x in response.json()['meetings']]

    # Get meeting details by ID
    def get_meeting_details(self, meeting_id):
        if meeting_id in self.meetings:
            return self.meetings[meeting_id]

        meeting_details_url = f"https://api.zoom.us/v2/meetings/{meeting_id}/recordings"
        headers = self.get_authorization_header()

        self.meetings[meeting_id] = self._make_request(meeting_details_url, headers=headers).json()
        return self.meetings[meeting_id]

    # Get audio transcript for a given meeting
    def get_audio_transcript(self, meeting_id):
        meeting_details = self.get_meeting_details(meeting_id)

        aud_ts_uri = [i['download_url'] for i in meeting_details['recording_files'] if i['recording_type']=='audio_transcript'][0]
        headers = self.get_authorization_header()

        response = self._make_request(aud_ts_uri, headers=headers)
        # response.content contains audio transcript in bytes, so we need to decode it to understand the content.
        decoded_content = response.content.decode("utf-8")
        final_formatted_content = decoded_content.strip().split('\r\n\r') # format content

        # Print the audio transcript
        for line in final_formatted_content: print(line)


    # Get closed captions
    def get_closed_captions(self, meeting_id):
        meeting_details = self.get_meeting_details(meeting_id)

        cc_uri = [i['download_url'] for i in meeting_details['recording_files'] if i['recording_type']=='closed_caption'][0]
        headers = self.get_authorization_header()

        response = self._make_request(cc_uri, headers=headers)

        # response.content contains closed caption in bytes, so we need to decode it to understand the content.
        decoded_content = response.content.decode("utf-8")
        final_formatted_content = decoded_content.strip().split('\r\n\r') # format content

        # Print the closed captions
        for line in final_formatted_content: print(line)

    def _make_request(self, uri, headers=None, method='GET', data=None):
        try:
            if method == 'POST':
                response = requests.post(uri, headers=headers, data=data)
            else:
                response = requests.get(uri, headers=headers)

            if response.status_code // 100 != 2:
                raise Exception('Invalid response from Zoom API. Please try again later.')
            return response
        except Exception as e:
            raise Exception('Invalid response from Zoom API. Please try again later. Error Message: ' + str(e))

Limitations of the Zoom Cloud Recording API

You might have noticed how we made some specific choices when we created the Zoom App to get transcription. This is because the native Zoom APIs have several limitations:

1. You need to be on a Paid Plan

The Zoom account hosting the meeting must be on the Pro, Business, or Enterprise tier. The free Basic plan will not work.

2. You need to enable Zoom Cloud Recording

Transcripts are only produced if the user records their meeting using Zoom Cloud Recording (not Zoom local recording). Zoom also has live captions that appear without recording the meeting, but it does not produce a transcript file after the call unless the meeting is being recorded.

3. Only the Meeting Host can access the transcript

The recorded meeting is only stored in the host's account, so other users cannot access the meeting and transcript details.

4. Meeting Host must enable recording settings for the transcript to be generated

The audio transcript feature in Zoom is disabled by default. You must enable this setting in Zoom for transcription to be produced.

5. Not accessible without elevated plan

Users must be on a paid plan to access recording/transcribing with Zoom natively.

6. Transcripts can take a long time to become available

After the meeting has ended, it typically takes about 2 times the duration of the recorded meeting for all the recordings to be available. For example, the transcript of a 30 minute long meeting will be available 1 hour after the meeting is done. Depending on the load on Zoom's servers, processing time can take up to 24 hours.

7. Transcripts are not available in real-time

Zoom Cloud transcripts are only available after the meeting is done, so you won’t be able to get the transcription in real-time.

Workarounds

To workaround the above limitations, there are a few options:

Workaround 1: Use the Zoom RTMP Live-Streaming API

Zoom allows Live Streaming the meeting to custom apps, using the Real-Time Messaging Protocol (RTMP) protocol. This allows you to use a custom app to fetch the audio of the meeting. This audio can then be piped to any 3rd-party transcription service to get real-time transcripts.

Advantages:

You get real-time transcripts.
You can get transcripts in any language.

Disadvantages:

The setup is more complex. You need to set up a live-streaming server.
You also need to integrate with a 3rd-party service for transcription, which can add to the cost and complexity.
This also requires a Zoom paid plan.
This also requires you to be the host.

Workaround 2: Build a Zoom Meeting Bot

Zoom allows meeting bots to access the video and audio contents of the meetings.