Why use Recall.ai’s Transcription API?
Recall.ai's speech-to-text API captures audio and video from conversations and delivers accurate, speaker-labeled transcripts. Whether chats happen in person, on the phone, or on video conferencing platforms, you’ll get high-quality, low-latency transcripts in real time or after the conversation.
Why developers choose our transcription API
Teams can implement our advanced speech-to-text API in minutes. Set your preferred transcription engine, start recording a conversation, and watch our API deliver fast, multilingual transcripts directly into your product.

One API, many transcription engines

Get transcripts with timestamps, metadata, and more
Recall.ai returns transcripts with timestamps, conversation metadata, and participant information so you can build features like speaker timelines, contextual transcript views, automated participant follow-ups, and AI workflows that populate systems like CRMs with stakeholder names.

Perfect speaker diarization

Transcription across multiple languages
Recall.ai supports multilingual transcription by letting you select a transcription provider based on language support. To accurately transcribe a conversation where people switch between multiple languages, choose a provider that supports code-switching.
Capture accurate, speaker-labeled transcripts at scale
Supporting everything from early prototypes to enterprise workloads, Recall.ai's speech-to-text API performs reliably across situations with overlapping speakers and noisy environments. With 99.9% uptime and flexible, usage-based pricing, you can confidently scale transcription as your product grows.
Build products with transcription data
Get real-time or asynchronous transcripts
By receiving real-time transcripts via webhook, you can use our API to power live sales coaching, AI agents, and more. You can also easily get transcripts after a conversation ends by calling our API endpoint.
Frequently asked questions
A transcription API converts speech into text so applications can work with conversations programmatically, such as meetings, calls, interviews, or recordings.
A transcription API lets teams focus on building features that rely on transcripts, rather than building and maintaining speech-to-text systems themselves.
Recall.ai supports multiple transcription providers, including AWS Transcribe, Recall.ai Transcription, Rev, Deepgram, AssemblyAI, Google Speech-to-Text. See the full list of third-party providers in our docs.
You can use Recall.ai’s built-in transcription engine for $0.15 per recording hour. You can also bring your own API key and pick from one of the many transcription providers we partner with, including AWS Transcribe, Deepgram, and ElevenLabs.
Speech-to-text APIs are used to turn conversations into structured data. Common use cases include live captions, AI copilots and note taking, sales coaching tools, recruiting products that update applicant tracking systems, and in-person interviews or user research.
Transcription APIs often return structured JSON with timestamps and speaker labels. Recall.ai’s Transcription API outputs JSON.
Use high-quality audio at 16kHz or higher, enable speaker diarization for multi-speaker conversations, provide custom vocabulary or context for domain-specific terms, and select models matched to your use case. With Recall.ai you can test out different transcription models in your meetings to see which will work best for your use case and language(s).
Accurate transcripts can benefit industries including sales, productivity, healthcare, recruiting, legal technology, education, and customer service.
If you use Recall.ai, there’s no infrastructure to manage. With a single API call, Recall.ai handles capturing audio across supported video conferencing platforms and sends the audio to the transcription provider you’ve chosen. If you bring your own transcription provider, you may need to pass your provider’s API keys. If you were to run transcription yourself, many transcription APIs run on standard servers, support Linux and Docker or Kubernetes for scaling, and offer SDKs in languages like Python, Go, C++, or Java.
Many speech-to-text APIs allow integration of custom models, adaptation of language and acoustic models for specific terminology, and addition of features like PII redaction for privacy compliance. Check with your transcription provider for specific customizability questions.
A transcription API processes audio, applies speech recognition models to convert speech to text, and returns structured output such as transcripts with timestamps and speaker labels. Transcription models either use deterministic methods or models to tackle features like speaker diarization and timestamps, then return formatted text as JSON, TXT, or SRT.
Many transcription APIs support speaker identification through machine diarization models like Pyannote to automatically label different speakers. Recall.ai’s Transcription API is the only offering that supports perfect diarization and speaker labeling.
With Recall.ai’s Transcription API you get speaker diarized transcripts along with video, audio, metadata and more just by calling a single api endpoint. You can subscribe to webhook events for live transcription or to fetch the transcript immediately after the meeting.
Real-time transcription provides instant text live use cases, while batch transcription processes pre-recorded audio asynchronously.
