FAQs: Voice transcription
How do I choose appropriate strictness values when entering a topic phrase?
To determine the optimal strictness for a phrase, start with a default setting, then evaluate the captured events and adjust the strictness accordingly. Experimentation is key to finding the right balance between precision and recall.
Longer phrases with more meaningful words often require less strict matching. For example, a three-word phrase might match effectively with medium strictness (two out of three words), while a five-word phrase might only need three matches (medium-low strictness). The ideal strictness depends on the specific phrase and its intended use.
Below are some examples where adjusting strictness can either help or hinder your results, highlighting the importance of tweaking strictness for the best precision/recall tradeoff.
For more information, see Work with a topic, and Work with a phrase.
Voice transcription – What is dictionary management ?
Dictionary management provides a means of improving recognition for business or domain-specific terms. Specific brands, words, or acronyms are transcribed based on the organization’s specifics. This feature allows customers to add terms to the dictionary, enhancing the transcription service’s likelihood of recognition. For more information, see Understand dictionary management.
One way of identifying similar-sounding terms involves observing recognition errors in the transcript. For example, if you consistently notice that “IRS” is transcribed as “eye are es,” you can now add the term “IRS” and include a similar-sounding entry with “eye are es.” See the following table for more examples.
Term | Example phrases | Sounds like |
---|---|---|
Neurological | He has a neurological condition | Neuro logical Euro logical |
Healthcare | Qualified healthcare provider | Health care |
Priming | Priming the pump Priming the charge Priming the brain | Prime ing |
IRS | An IRS audit Following IRS direction Reviewed by the IRS Requested by the IRS | Eye are ess |
Acme | My favorite brand is Acme Acme brand pancakes Strong loyalty from acme I like acme | Acne |
Louis Vuitton | My favorite brand is Louis Vuitton I like Louis Vuitton hand bags I’d like to order a Louis Vuitton suitcase | Lui vito |
- Dictionary management is not case sensitive. It will not modify the capitalization of terms in the transcript.
- In languages that typically lack spaces, such as Japanese, it is important to include spaces in terms to enhance performance.
Dictionary Management does not interfere with Topic Spotting, so users wanting to spot topics in interactions should continue to use the service. The service currently supports native voice transcription dialects. For more information, see Genesys Cloud supported languages.
Voice transcription – How much does Extended Voice Transcription Services cost?
Extended Voice Transcription Services is billed based on per minute usage. For every minute of voice transcription that occurs through Extended Voice Transcription Services, your organization will be billed based on your billed currency.
- Under Voice transcription (legacy) (GC-170-NV-VTFAIRUSEO) EVTS has no fair use allocation. Billing occurs once EVTS is used.
- Under Voice transcription (GC-170-NV-VOICETRANSCRIPTION) EVTS and native has a fair use allocation.
- EVTS is available for Genesys Cloud CX1 and Genesys Cloud CX2 Orgs as long as the Genesys Cloud CX1 WEM Add-on II or Genesys Cloud CX2 WEM Add-on I is enabled. When using EVTS, transcribed users will not be billed for Genesys Cloud CX1 WEM Add-on II or Genesys Cloud CX2 WEM Add-on I provided that Topic Spotting is not enabled for those interactions.
USD | CAD | AUD | NZD | GBP | EUR | BRL | JPY | ZAR |
---|---|---|---|---|---|---|---|---|
0.0100 | 0.0110 | 0.0130 | 0.0140 | 0.0070 | 0.0080 | 0.0400 | 1.2000 | 0.1420 |
For more information, see Genesys Cloud fair use policy and Genesys Cloud pricing and concurrency update.
Voice transcription – How does Extended Voice Transcription Services – Azure provide customer data security?
Extended Voice Transcription Services streams media outside of Genesys Cloud to a third party to generate voice transcripts. Currently, these Extended Voice Transcription Services are provided by Microsoft through their Azure Speech-to-Text offering. As part of this combined offering, Genesys ensures data security in the following ways:
Note: Genesys Cloud is transitioning the Extended Voice Transcription Services engine from Microsoft Azure to AWS Transcribe. Impacted organizations will receive advance notice prior to any changes./bs_well]- Azure Speech-to-Text does not store any audio or transcription data at rest. All data in-transit is encrypted. For more information, see Microsoft Data and Privacy for Speech-to-Text.
- The media sent to Azure Speech-to-Text services is processed only in Azure’s server memory and no data is stored at rest by the third party.
- Once transcribed, all transcripts are encrypted and safely stored within Genesys Cloud.
- All media sent to a third party is encrypted using TLS.
- Transcripts created by Extended Voice Transcription and recorded interactions are stored by Genesys Cloud using the same type of encryption.
For more information, see Recording encryption key overview, Understand voice transcripts, and Azure regions for Extended Voice Transcription Services.
Voice transcription – What is the difference between Genesys Cloud Voice Transcription and Extended Voice Transcription Services?
Both Genesys Cloud Voice Transcription and Extended Voice Transcription Services (EVTS) can transcribe voice interactions.
The differences between Genesys Cloud Voice Transcription and Extended Voice Transcription Services (EVTS) are summarized in the following list.
- EVTS extends Genesys Cloud’s own native transcription.
- EVTS uses third party transcription services and may have different performance attributes.
- EVTS can provide access to additional dialects and languages.
- EVTS uses a non-customizable transcription model. Customization is only available with Genesys Voice Transcription.
- For non-Genesys Cloud CX3 customers (in addition to EVTS charges), the customer will also be billed for WEM Add-on when Topic Spotting is used.
For more information about EVTS, see:
- Voice transcription – How much does Extended Voice Transcription Services cost?
- Voice transcription – How does Extended Voice Transcription Services provide customer data security?
- Voice transcription – Is voice transcription supported using third parties such as Amazon, Google, or Microsoft?
Voice transcription – Can I download a voice transcript?
You can export transcripts from one or more interactions using the speech and text analytics API.
Also, a transcript can be copied manually from the Interaction Details page by clicking the Copy Transcript option in the top right corner of the transcript. For more information, see Work with a digital transcript.
For more information, see Speech and text analytics API.
Voice transcription – What is the accuracy of voice transcription and how do I increase it?
A variety of factors can affect transcription accuracy. For more information, see Improving transcription accuracy. Genesys Cloud native voice transcription performs at a similar level of accuracy to other transcription vendors.
After you address all factors that may negatively impact accuracy, you can use dictionary management to improve accuracy.
Dictionary management provides a means of improving recognition for business or domain-specific terms. Specific brands, words, or acronyms are transcribed based on the organization’s specifics. This feature allows customers to add terms to the dictionary, enhancing the transcription service’s likelihood of recognition. For more information, see Understand dictionary management.
Dictionary Management does not interfere with topic spotting. Topic spotting supports native voice transcription dialects. For more information, see Genesys Cloud supported languages.
Perform the following to improve accuracy with topic spotting.
- Add the term to the phrase list within a new or existing topic.
- Verify the specific topic is added to the topic list of the program used to transcribe the interactions.
Transcription accuracy rates can vary significantly within the contact center based on audio quality, clarity of speech and additional training provided through topics.
Accuracy of voice transcription is typically measured by Word Error Rate (WER). WER identifies the number of words that are incorrectly transcribed during voice transcription, and divides this number by the number of words in a manual transcription.
There are three types of errors.
- Insertion (I): When words are incorrect added to the transcript.
- Deletion (D): When words are not detected within the transcript.
- Substitution (S): When words are substituted for irrelevant words.
These are added together and divided by the total number of words from the manual transcription (N).
WER is then calculated with the following equation:
For more information, see Improving transcription accuracy, and Work with a phrase.
Voice transcription – How do I make sure that custom words, product names, and brand names are transcribed correctly?
A variety of factors can affect transcription accuracy. For more information, see Improving transcription accuracy.
After you address all factors that may negatively impact accuracy, you can use dictionary management to improve accuracy.
Dictionary management provides a means of improving recognition for business or domain-specific terms. Specific brands, words, or acronyms are transcribed based on the organization’s specifics. This feature allows customers to add terms to the dictionary, enhancing the transcription service’s likelihood of recognition. For more information, see Understand dictionary management.
Boost values range from 1 to 10 and increase the likelihood of the term’s identification on a logarithmic scale. Boost values are only available with the API.
Dictionary Management does not interfere with topic spotting. Topic spotting supports native voice transcription dialects. For more information, see Genesys Cloud supported languages.
Perform the following to improve accuracy with topic spotting.
- Add the term to the phrase list within a new or existing topic.
- Verify the specific topic is added to the topic list of the program used to transcribe the interactions.
After applying new terms, example phrases, similar-sounding terms, and boost levels through the API, it may take up to 30 minutes for the improvements to become effective.
For more information, see Improving transcription accuracy, and Work with a phrase.
Voice transcription – Is voice transcription supported using third parties such as Amazon, Google, or Microsoft?
Genesys Cloud uses its own native transcription engine and includes Extended Voice Transcription Services (EVTS) as an alternative to voice transcription. The underlying provider for Extended Voice Transcription Services can be either Microsoft Azure Speech-to-Text, or AWS Transcribe.
EVTS provides customers with additional language support beyond the Genesys Cloud native transcription engine, and a choice between the engines when transcribing voice interactions.
For other voice transcription providers such as Google, you must integrate using existing AudioHook and Transcription connector capabilities.
For more information, see: About AudioHook Monitor, and Voice transcription – What is the difference between Genesys Cloud Voice Transcription and Extended Voice Transcription Services.
Voice transcription – What is the expected latency and level of accuracy for voice transcription?
Within Genesys Cloud, audio is transcribed in near real time, with a typical latency of 3-5 seconds, and is accessible through our Notifications APIs. The full interaction transcript becomes available in the Interaction Details UI immediately after the call, usually within 15 seconds.
For more information, see Genesys Cloud supported languages, How do I increase the accuracy of voice transcription?, Configure voice transcription, and How do I make sure that custom words, product names, and brand names are transcribed correctly?.