Having an audio conversation with your Teams Bot




We have recently started to explore the ability to have users call a Teams bot and have meaningful conversations.  The overall flow would be like this

1. User places an audio call to the Teams bot

2. Teams bot answers the call using MS Graph call:answer

3. Once the call has been established, the Teams bot uses the Microsoft Azure Cognitive Services Speech SDK to generate a WAV file and then posts to MS Graph call:recordResponse.  This will have the Teams bot speak a greeting and wait to hear the user's speech

4. With recording started, the user's speech will be saved locally.  The bot will use the Speech SDK to recognize the speech.  

5. The recognized text is then sent to our LUIS App so that an intent can be identified.

6. The bot handles the recognized intent, creates a text response, then uses the Speech SDK to generate a WAV file.  The WAV file is then posted to MS Graph call:recordResponse so that the response is spoken to the user, with the bot then enter into recording mode to get the user's next speech.


With the above approach, there is considerable pause between the user's last speech and the bot's audio response.  This is probably due to the number of API calls the bot has to do to recognize the user's speech and play an appropriate audio response.  Is there a better way to handle audio calls to a bot with Teams?  Is there a better way to utilize MS Graph with the Speech SDK?  We looked at 

Speech SDK's AudioConfig.fromDefaultMicrophoneInput() but it does not seem to work in a Teams channel.


Using the WindowsVoiceAssistantClient, we connect it to our speech resource and we find the conversation exchange between the user and the bot a lot more responsive.  Is it possible to implement this conversation experience in Teams?


Thank You

3 Replies
best response confirmed by voonsionglum (Contributor)

Could you please refer below doc if its helped you in any way.
Calls and online meetings bots - Teams | Microsoft Docs

@voonsionglum - Could you please confirm if above document helped you or are you still looking for any help?
Thank You. It seems like the only to have direct access to real time audio stream is through an application hosted media bot. We'll give that a try.