Game of Learners Clinics for ML (Machine learning) and AI (Artificial intelligence) is a 5-week Skilling initiative for students to level up in Artificial Intelligence on Azure. Find out more on previous sessions at http://aka.ms/golaiml-home
Natural Language Processing (NLP) involves analyzing text documents or phrases to gain insights into the content of the text. It is also the ability of a computer program to understand human language as it is spoken and/or written.
In this blog we explore natural language processing about speech recognition and synthesis. In the end you will see a demo of speech to text and text to speech.
Azure Resources for Speech Services:
Using Microsoft Azure, provision resources under cognitive services for speech. Using the services, you can perform several actions including translating speech to text and vice versa, speech translation and speaker recognition.
Speech to Text API
As the world becomes a global village with organizations needing to collaborate with people in different geographical regions, the removal of language barriers has become key. One solution has been through translation. Text translation can be used to translate documents from one language to another whereas speech translation is between spoken languages. Sometimes, speech translation may also involve speech to text translation.
Using speech to text API you can perform real-time to batch transcription of audio to text. As you analyze text/documents using the Language Cognitive Service, you can:
Real Time and batch transcription
Supports any form of audio source
Based on the Universal Language Model, trained by Microsoft. The model is optimized for both conversational and dictation scenarios.
Text to Speech API
Text to speech on the other hand involves generating spoken audio from text. The speech service language support enables you to translate in over 60 languages. The text to speech API enables you to convert text input into audible speech that can be directly played on your computer speaker or written to an audio file. It has the following characteristics:
Used to convert text input to audible speech
Supports multiple languages and regional pronunciation