Transcription API

Convert audio streams into accurate, structured text with our flexible API, in real time or asynchronously

Why use Recall.ai’s Transcription API?

Recall.ai's speech-to-text API captures audio and video from conversations and delivers accurate, speaker-labeled transcripts. Whether chats happen in person, on the phone, or on video conferencing platforms, you’ll get high-quality, low-latency transcripts in real time or after the conversation.

Why developers choose our transcription API

Teams can implement our advanced speech-to-text API in minutes. Set your preferred transcription engine, start recording a conversation, and watch our API deliver fast, multilingual transcripts directly into your product. 

One API, many transcription engines

Pick between Recall.ai’s transcription engine or one of our transcription providers like AssemblyAI and Deepgram using a single API. Recall.ai’s Transcription API abstracts away the complexity of integrating with each provider so you can easily switch engines based on accuracy, latency, cost, or language needs.

Get transcripts with timestamps, metadata, and more

Recall.ai returns transcripts with timestamps, conversation metadata, and participant information so you can build features like speaker timelines, contextual transcript views, automated participant follow-ups, and AI workflows that populate systems like CRMs with stakeholder names.

Perfect speaker diarization

Get 100% accurate speaker identification out of the box. Recall.ai is the only transcription provider that can reliably diarize conversations with accurate speaker names across all major video conferencing platforms.

Transcription across multiple languages

Recall.ai supports multilingual transcription by letting you select a transcription provider based on language support. To accurately transcribe a conversation where people switch between multiple languages, choose a provider that supports code-switching.

Capture accurate, speaker-labeled transcripts at scale

Supporting everything from early prototypes to enterprise workloads, Recall.ai's speech-to-text API performs reliably across situations with overlapping speakers and noisy environments. With 99.9% uptime and flexible, usage-based pricing, you can confidently scale transcription as your product grows.

Incident management tool
AI meeting copilot
Stenographer
Medical scribe
Live sales coaching
Interview notetaker
Meeting notetaker
Task tracker

Build products with transcription data

Get real-time or asynchronous transcripts

By receiving real-time transcripts via webhook, you can use our API to power live sales coaching, AI agents, and more. You can also easily get transcripts after a conversation ends by calling our API endpoint.

"Integrating with Recall.ai was seamless. Recall.ai Transcription gave us accurate transcripts immediately and reliably. Because they support more transcription providers than any other platform we also had the flexibility to figure out which provider worked for our needs."

Raunak Surana

Frequently asked questions

A transcription API converts speech into text so applications can work with conversations programmatically, such as meetings, calls, interviews, or recordings.

A transcription API lets teams focus on building features that rely on transcripts, rather than building and maintaining speech-to-text systems themselves.

Recall.ai supports multiple transcription providers, including AWS Transcribe, Recall.ai Transcription, Rev, Deepgram, AssemblyAI, Google Speech-to-Text. See the full list of third-party providers in our docs.

You can use Recall.ai’s built-in transcription engine for $0.15 per recording hour. You can also bring your own API key and pick from one of the many transcription providers we partner with, including AWS Transcribe, Deepgram, and ElevenLabs.

Speech-to-text APIs are used to turn conversations into structured data. Common use cases include live captions, AI copilots and note taking, sales coaching tools, recruiting products that update applicant tracking systems, and in-person interviews or user research.

Transcription APIs often return structured JSON with timestamps and speaker labels. Recall.ai’s Transcription API outputs JSON.

Use high-quality audio at 16kHz or higher, enable speaker diarization for multi-speaker conversations, provide custom vocabulary or context for domain-specific terms, and select models matched to your use case. With Recall.ai you can test out different transcription models in your meetings to see which will work best for your use case and language(s).

Accurate transcripts can benefit industries including sales, productivity, healthcare, recruiting, legal technology, education, and customer service.

If you use Recall.ai, there’s no infrastructure to manage. With a single API call, Recall.ai handles capturing audio across supported video conferencing platforms and sends the audio to the transcription provider you’ve chosen. If you bring your own transcription provider, you may need to pass your provider’s API keys. If you were to run transcription yourself, many transcription APIs run on standard servers, support Linux and Docker or Kubernetes for scaling, and offer SDKs in languages like Python, Go, C++, or Java.

Many speech-to-text APIs allow integration of custom models, adaptation of language and acoustic models for specific terminology, and addition of features like PII redaction for privacy compliance. Check with your transcription provider for specific customizability questions.

A transcription API processes audio, applies speech recognition models to convert speech to text, and returns structured output such as transcripts with timestamps and speaker labels. Transcription models either use deterministic methods or models to tackle features like speaker diarization and timestamps, then return formatted text as JSON, TXT, or SRT.

Many transcription APIs support speaker identification through machine diarization models like Pyannote to automatically label different speakers. Recall.ai’s Transcription API is the only offering that supports perfect diarization and speaker labeling.

With Recall.ai’s Transcription API you get speaker diarized transcripts along with video, audio, metadata and more just by calling a single api endpoint. You can subscribe to webhook events for live transcription or to fetch the transcript immediately after the meeting.

Real-time transcription provides instant text live use cases, while batch transcription processes pre-recorded audio asynchronously.