Transcription API

Convert audio streams into accurate, structured text with our flexible API, in real time or asynchronously

Conversation between Amy and Cassie about real-time speaker diarization in an incident management app, with a blue audio waveform above.

Why use Recall.ai’s Transcription API?

Recall.ai's speech-to-text API captures audio and video from conversations and delivers accurate, speaker-labeled transcripts. Whether chats happen in person, on the phone, or on video conferencing platforms, you’ll get high-quality, low-latency transcripts in real time or after the conversation.

"Recall.ai powers our Notetaker recordings. Perfect diarization across video conferencing platforms allows us to deliver accurate, speaker-labeled transcripts to our customers. That same attention to product quality carries through to their team. They’ve been a true partner, proactive and supportive, always bringing thoughtful ideas and helping our plans come to life."

Galya Dimitrova

Why developers choose our real-time transcription API

Teams can implement our advanced speech-to-text API in minutes. Set your preferred transcription engine, start recording a conversation, and watch our API deliver fast, multilingual transcripts directly into your product.

One API, many transcription engines

Pick between Recall.ai’s transcription engine or one of our transcription providers like AssemblyAI and Deepgram using a single API. Recall.ai’s Transcription API abstracts away the complexity of integrating with each provider so you can easily switch engines based on accuracy, latency, cost, or language needs.

Get transcripts with timestamps, metadata, and more

Recall.ai returns transcripts with timestamps, conversation metadata, and participant information so you can build features like speaker timelines, contextual transcript views, automated participant follow-ups, and AI workflows that populate systems like CRMs with stakeholder names.

Perfect speaker diarization

Get 100% accurate speaker identification out of the box. Recall.ai is the only transcription provider that can reliably diarize conversations with accurate speaker names across all major video conferencing platforms.

Transcription across multiple languages

Recall.ai supports multilingual transcription by letting you select a transcription provider based on language support. To accurately transcribe a conversation where people switch between multiple languages, choose a provider that supports code-switching.

Capture accurate, speaker-labeled transcripts at scale

Supporting everything from early prototypes to enterprise workloads, Recall.ai's speech-to-text API performs reliably across situations with overlapping speakers and noisy environments. With 99.9% uptime and flexible, usage-based pricing, you can confidently scale transcription as your product grows.

Incident management tool

AI meeting copilot

Stenographer

Medical scribe

Live sales coaching

Interview notetaker

Meeting notetaker

Task tracker

Build products with transcription data

Get real-time or asynchronous transcripts

By receiving real-time transcripts via webhook, you can use our API to power live sales coaching, AI agents, and more. You can also easily get transcripts after a conversation ends by calling our API endpoint.

"Integrating with Recall.ai was seamless. Recall.ai Transcription gave us accurate transcripts immediately and reliably. Because they support more transcription providers than any other platform we also had the flexibility to figure out which provider worked for our needs."

Raunak Surana

Frequently asked questions

What is a real-time transcription API?

A real-time transcription API converts live speech into text so applications can work with conversations like meetings, calls, or interviews programmatically. This also supports interactive tools like AI agents which need to process and respond to

Why use a transcription API?

A transcription API lets teams focus on building features that rely on transcripts, rather than building and maintaining speech-to-text systems themselves.

What transcription providers does Recall.ai support?

Recall.ai supports multiple transcription providers, including AWS Transcribe, Recall.ai Transcription, Rev, Deepgram, AssemblyAI, Google Speech-to-Text. See the full list of third-party providers in our docs.

How much does it cost to use Recall.ai’s transcription API?

You can use Recall.ai’s built-in transcription engine for $0.15 per recording hour. You can also bring your own API key and pick from one of the many transcription providers we partner with, including AWS Transcribe, Deepgram, and ElevenLabs.

What can I build with a speech-to-text API?

Speech-to-text APIs are used to turn conversations into structured data. Common use cases include live captions, AI copilots and note taking, sales coaching tools, recruiting products that update applicant tracking systems, and in-person interviews or user research.

What output formats do transcription APIs typically support?

Transcription APIs often return structured JSON with timestamps and speaker labels. Recall.ai’s Transcription API outputs JSON.

How can I improve transcription accuracy?

Use high-quality audio at 16kHz or higher, enable speaker diarization for multi-speaker conversations, provide custom vocabulary or context for domain-specific terms, and select models matched to your use case. With Recall.ai you can test out different transcription models in your meetings to see which will work best for your use case and language(s).

What industries can benefit from transcription?

Accurate transcripts can benefit industries including sales, productivity, healthcare, recruiting, legal technology, education, and customer service.

What are the technical requirements for running a real-time transcription API?

If you use Recall.ai, there’s no infrastructure to manage. With a single API call, Recall.ai handles capturing audio across supported video conferencing platforms and sends the audio to the transcription provider you’ve chosen. If you bring your own transcription provider, you may need to pass your provider’s API keys. If you were to run transcription yourself, many transcription APIs run on standard servers, support Linux and Docker or Kubernetes for scaling, and offer SDKs in languages like Python, Go, C++, or Java.

Can I customize transcription models for specific terminology?

Many speech-to-text APIs allow integration of custom models, adaptation of language and acoustic models for specific terminology, and addition of features like PII redaction for privacy compliance. Check with your transcription provider for specific customizability questions.

How does a transcription API work?

A transcription API processes audio, applies speech recognition models to convert speech to text, and returns structured output such as transcripts with timestamps and speaker labels. Transcription models either use deterministic methods or models to tackle features like speaker diarization and timestamps, then return formatted text as JSON, TXT, or SRT.

Do transcription APIs support speaker identification?

Many transcription APIs support speaker identification through machine diarization models like Pyannote to automatically label different speakers. Recall.ai’s Transcription API is the only offering that supports perfect diarization and speaker labeling.

How do I integrate a speech-to-text API into my application?

With Recall.ai’s Transcription API you get speaker diarized transcripts along with video, audio, metadata and more just by calling a single api endpoint. You can subscribe to webhook events for live transcription or to fetch the transcript immediately after the meeting.

What's the difference between real-time and batch transcription?

Real-time transcription provides instant text live use cases, while batch transcription processes pre-recorded audio asynchronously.