Realtime Voice - The Road to AI Assistants

For me, the power of AI starts getting exciting when we bring in multimodal interactions. Voice (for now) remains the most fluid way of communicating. Bringing it to your LLM is a simple pipeline: Speech to text (STT) —> LLM —> Text to Speech (TTS). To allow for a conversation (that doesn’t depend on keywords like “Hi Siri”) we need to use Voice Activity Detection (VAD).

There are many excellent tools out there for each of these components. Selecting them is not the focus of this tutorial.

The full project code can be found here:

https://github.com/lchavasse/langgraph-tutorial/tree/51bed31a94a0e36a511063d4fec096a8a87a4388/voice-assistant

LiveKit Voice Assistant

In order to build a production AI assistant, you need to transfer your user’s audio to your application in real time. For this we need WebRTC. We’re going to the LiveKit framework so we can focus on our agent. For this they have a ready-made Voice Assistant. This can be setup with 5 components:

assistant = VoiceAssistant(
        vad=silero.VAD.load(),
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=openai.TTS(),
        chat_ctx=initial_ctx,
    )

Fell free to follow their quickstart to get up to speed.

(Create a LiveKit cloud account and get Deepgram and OpenAI API keys if you don’t already)

Use the sandbox (or host the frontend) to chat to the agent. They allow for several LLMs out the box, and you can add RAG and function calls to the pipeline, but we’re going to delve into the code in order to integrate our own agent.

You could get started even quicker using the OpenAI Realtime API. This is certainly an exciting space, and allows for some customisation, however, there are 2 issues.

Cost. As of January 2025, a couple of minutes conversation is costing you several dollars.
Flexibility. By integrating our agentic system we can managed many more tools and databases.

There are some other compelling packages for rapid low-code development of Conversational AI Assistants. I discuss these here.

LiveKit Voice Assistant

Adding our LLM / Agent