How to recognize and synthesize speech on Azure - GOL Clinics Recap
Published Aug 15 2022 01:33 AM 814 Views

Game of Learners Clinics for ML (Machine learning) and AI (Artificial intelligence) is a 5-week Skilling initiative for students to level up in Artificial Intelligence on Azure. Find out more on previous sessions at

Natural Language Processing (NLP) involves analyzing text documents or phrases to gain insights into the content of the text. It is also the ability of a computer program to understand human language as it is spoken and/or written.


In this blog we explore natural language processing about speech recognition and synthesis. In the end you will see a demo of speech to text and text to speech.

Azure Resources for Speech Services:


Using Microsoft Azure, provision resources under cognitive services for speech. Using the services, you can perform several actions including translating speech to text and vice versa, speech translation and speaker recognition.


Speech to Text API

As the world becomes a global village with organizations needing to collaborate with people in different geographical regions, the removal of language barriers has become key. One solution has been through translation. Text translation can be used to translate documents from one language to another whereas speech translation is between spoken languages. Sometimes, speech translation may also involve speech to text translation.


Using speech to text API you can perform real-time to batch transcription of audio to text.  As you analyze text/documents using the Language Cognitive Service, you can:

  • Real Time and batch transcription
  • Supports any form of audio source
  • Based on the Universal Language Model, trained by Microsoft. The model is optimized for both conversational and dictation scenarios.

Text to Speech API

Text to speech on the other hand involves generating spoken audio from text. The speech service language support enables you to translate in over 60 languages. The text to speech API enables you to convert text input into audible speech that can be directly played on your computer speaker or written to an audio file. It has the following characteristics:

  • Used to convert text input to audible speech
  • Supports multiple languages and regional pronunciation
  • Supports standard voices as well as neural voice.


Reference and Resources:

Follow along and build your Bot with Azure Bot Service at:

Version history
Last update:
‎Aug 03 2022 06:46 AM
Updated by: