Tutorials

Speaker Labels and Speaker Diarization Explained: How to Obtain and Use Them for Accurate Transcription

Amanda Zhu

June 11, 2024

Table of Contents

Table of Contents

One of the most valuable use-cases for LLMs is to analyze the transcript of a conversation.

They can extract:

However, for LLMs to produce accurate results, the transcript must contain speaker labels.

What are Speaker Labels?

Speaker labels, also known as speaker diarization, tells you who spoke each word of a transcript.

Here is an example of a transcript with speaker labels:

Speaker Label Example

Why do you need Speaker Labels?

Reason 1: They enable LLMs to analyze transcripts more accurately

LLMs benefit massively from knowing who spoke each word, as that gives a significant amount of context into the conversation.

For example, in the following transcript snippet without speaker labels, ChatGPT produces the following action items with a simple prompt: 

---- Input ----
We need to follow up with the potential client from last week. I can do that. We also need to prepare a customized proposal for them. I'll handle that.

---- Output ----
Follow up with the potential client - Responsibility: Ambiguous
Prepare a customized proposal - Responsibility: Ambiguous

The LLM struggles to determine who is responsible for each task because the speakers are not identified, and it’s not clear that this is a conversation between two people.

However if we add speaker labels, the structure and participants of the conversation become much more clear, and the LLM does a much better job.

----- Input ----
John: We need to follow up with the potential client from last week.
Sarah: I can do that.
John: We also need to prepare a customized proposal for them.
Mike: I'll handle that.

---- Output ----
Follow up with the potential client - Responsibility: Sarah
Prepare a customized proposal - Responsibility: Mike

Reason 2: It helps readers follow a transcript

If you’re displaying transcripts in your app, it’s much easier to read when they include speaker labels, since they break up the conversation into natural segments. 

Speaker Label Readability Example

How to get Speaker Labels for your Transcripts 

There are a few options to get speaker labels for transcripts:

Option 1: Use the Recall.ai API

Transcripts captured through the Recall.ai API include speaker labels built-in. With Recall.ai, you would get a transcript format like the following.

Recall.ai Speaker Label Example

Recall.ai works with conversations on Zoom, Microsoft Teams, Google Meet, or other video conferencing platforms, and integrates directly with the video conferencing platform to retrieve the speaker names.

Here is a short video of how to get a transcript with speaker labels using Recall.ai API:


Interested in trying out the Recall.ai API?

👉 Sign up to get an API key

👉 Book a demo

👉 Or read our API docs

Option 2: Use Machine Diarization

Machine diarization is a technology that figures out when different people are speaking by analyzing their unique voice patterns.

Most transcription APIs have machine diarization built-in, and this can typically be enabled by setting the correct API parameter. 

However, the speaker labels produced by machine diarization will be placeholders like “Speaker 1” or “Speaker 2”. This is because the transcription AI doesn’t have a way to figure out the actual names of the participants.

Machine Diarization Speaker Labels

Machine diarization has a couple of downsides:

Despite these downsides, machine diarization can be a good option when conversations are not held on a video conferencing platform. Here are some popular transcription APIs that have machine diarization built-in:

Popular Use Cases for Speaker Labels

Sales Coaching Software

If you’re building an app that records and transcribes sales calls, speaker labels allow you to understand when the salesperson or  prospect is speaking. From there, you can derive additional information, such as:


A real example


Sybill provides sales coaching software that automatically fills out the CRM, drafts follow up emails, and summarizes the sales meeting. Knowing if it was the salesperson speaking or the prospect speaking was critical for them to be able to analyze conversations accurately.

👉 Read how Sybill used Recall.ai in our case study

Interviewing Software

If you want to analyze interview dynamics, speaker labels allow you to derive information such as:

Furthermore, if you know the speaker name, you can typically fuzzy match the speaker name with the invitees on the calendar invite of the interview to figure out which email corresponds to which speaker. Unfortunately, it is not possible to get the speaker's email directly, so this workaround must be used.

Recall.ai powers a number of companies building interviewing software, such as BrightHire and Metaview. Recall.ai enables them to not only get speaker-labeled transcripts, but also enables them to get video recordings and metadata from interviews.

Virtual Legal Deposition Software

In virtual legal deposition software, speaker labels help distinguish between different parties involved, such as the lawyer, witness, and opposing counsel. This allows you to:

Telehealth Software

In telehealth consultations, it’s essential to know if the doctor or patient is speaking. This helps to:

Conclusion

In this article, we’ve covered why speaker labels are important, as well as how to get speaker labels for your transcripts. 

If the conversations you’re transcribing are on video conferencing platforms, you can use an API like Recall.ai to get speaker labels with accurate participant names. 

If the conversations you’re transcribing are happening elsewhere, you can use a transcription service that has built-in machine diarization to label speakers as “Speaker 1”, “Speaker 2”.

Either way, adding speaker labels to your transcripts gives your LLMs the necessary context to extract more information from your conversation data. Happy building!