January 19, 2025
Building a LiveKit voice agent with Anam avatars
LiveKit is an open-source platform for building real-time audio and video applications. It handles all the complexity of WebRTC—signaling, NAT traversal, media routing—so you can focus on your application logic.
One of LiveKit's features is the Agents framework, which lets you build AI-powered voice assistants that join rooms as participants. These agents can listen to users, process speech, and respond with synthesized audio. In this cookbook, we'll build a voice agent using OpenAI's Realtime API and add a visual avatar face using Anam.
How LiveKit agents work
LiveKit uses a room-based architecture. Human users and AI agents both connect to rooms as participants:
- LiveKit Cloud hosts your rooms and handles all the real-time infrastructure
- Agents run as separate processes that automatically join rooms when users connect
- Clients (web, mobile, etc.) connect to rooms and interact with agents through audio/video streams
When you deploy an agent to LiveKit Cloud, it waits for rooms to be created. When a user connects to a room, the agent dispatcher automatically spins up your agent and joins it to the same room. The agent and user can then communicate through audio streams, just like a video call.
For this cookbook, we'll use OpenAI's Realtime API to handle voice input, the LLM, and text-to-speech. Anam then takes the audio output and generates a synchronized avatar face.
What you'll build
A voice AI agent deployed to LiveKit Cloud with:
- Voice input/output powered by OpenAI Realtime
- A visual avatar face generated by Anam
- A React frontend for users to connect and talk to the agent
Prerequisites
- Node.js
- LiveKit CLI installed
- A LiveKit Cloud account
- An OpenAI API key
- An Anam API key from lab.anam.ai
Setting up the agent
We'll start with LiveKit's Node.js agent starter and add the Anam avatar integration.
Clone the starter repository and install dependencies:
git clone https://github.com/livekit-examples/agent-starter-node.git
cd agent-starter-node
pnpm installDownload the required model files (VAD and turn detection):
pnpm run download-filesCreate a .env.local file with your credentials:
# LiveKit Cloud credentials (from cloud.livekit.io)
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
# OpenAI (for voice + LLM)
OPENAI_API_KEY=your_openai_keyThe base agent
Let's look at what the starter gives us. Open src/agent.ts:
import { voice } from '@livekit/agents';
export class Assistant extends voice.Agent {
constructor() {
super({
instructions: `You are a helpful voice AI assistant. The user is interacting with you via voice, even if you perceive the conversation as text.
You eagerly assist users with their questions by providing information from your extensive knowledge.
Your responses are concise, to the point, and without any complex formatting or punctuation including emojis, asterisks, or other symbols.
You are curious, friendly, and have a sense of humor.`,
});
}
}The Assistant class defines the personality and behavior of your voice AI. The instructions act like a system prompt, telling the LLM how to behave. Since this is a voice assistant, the instructions emphasize concise responses without formatting that wouldn't translate well to speech.
The starter wires this up with OpenAI's Realtime API for voice input, LLM processing, and text-to-speech. You can run it as-is and have a working voice agent—but it won't have a face. Let's add that.
Adding the Anam plugin
Install the Anam plugin for LiveKit agents:
pnpm add @livekit/agents-plugin-anamAdd your Anam credentials to .env.local:
# Anam (for avatar face)
ANAM_API_KEY=your_anam_key
ANAM_AVATAR_ID=edf6fdcb-acab-44b8-b974-ded72665ee26The avatar ID above is "Mia", one of Anam's stock avatars. You can browse other avatars or create your own at lab.anam.ai/avatars.
Modifying the agent
Now we need to modify src/agent.ts to start an Anam avatar session alongside the voice session. Replace the contents with:
import { type JobContext, ServerOptions, cli, defineAgent, voice } from '@livekit/agents';
import * as anam from '@livekit/agents-plugin-anam';
import * as openai from '@livekit/agents-plugin-openai';
import { BackgroundVoiceCancellation } from '@livekit/noise-cancellation-node';
import dotenv from 'dotenv';
import { fileURLToPath } from 'node:url';
dotenv.config({ path: '.env.local' });We've added imports for the Anam plugin, the OpenAI plugin, and noise cancellation. The dotenv import loads our environment variables.
Next, define the Assistant class (same as before):
class Assistant extends voice.Agent {
constructor() {
super({
instructions: `You are a helpful voice AI assistant.
You eagerly assist users with their questions by providing information from your extensive knowledge.
Your responses are concise, to the point, and without any complex formatting or punctuation including emojis, asterisks, or other symbols.
You are curious, friendly, and have a sense of humor.`,
});
}
}Now add the agent entry point that sets up both the voice session and the avatar:
export default defineAgent({
entry: async (ctx: JobContext) => {
await ctx.connect();
console.log(`Starting agent session in room: ${ctx.room.name}`);
const session = new voice.AgentSession({
llm: new openai.realtime.RealtimeModel({ voice: 'alloy' }),
});
await session.start({
agent: new Assistant(),
room: ctx.room,
inputOptions: {
noiseCancellation: BackgroundVoiceCancellation(),
},
});When the agent receives a job (triggered by a user connecting to a room), it:
- Connects to the room with
ctx.connect() - Creates a voice session using OpenAI's Realtime model with the "alloy" voice
- Starts the session with our Assistant, enabling noise cancellation for cleaner audio input
Now add the Anam avatar session. This goes right after the voice session starts:
const avatarId = process.env.ANAM_AVATAR_ID;
if (!avatarId) {
console.warn(
'ANAM_AVATAR_ID is not set. Avatar session will not start. Set ANAM_AVATAR_ID in your .env.local file.',
);
return;
}
const avatarSession = new anam.AvatarSession({
personaConfig: {
name: 'Cara',
avatarId,
},
});
await avatarSession.start(session, ctx.room);
console.log('Agent and avatar session started');
},
});The AvatarSession takes the voice session and room as inputs. It listens to the audio being sent to users and generates a synchronized video stream of the avatar speaking. The video is published to the room as a separate track that clients can display.
Finally, add the line that starts the agent server:
cli.runApp(new ServerOptions({ agent: fileURLToPath(import.meta.url) }));This registers the agent with LiveKit and listens for incoming jobs.
For a complete working example, see anam-org/livekit-agent-node-example which has all of this already set up.
Testing locally
Before deploying, you can test the agent locally:
pnpm run devThe agent will connect to LiveKit Cloud and wait for rooms. You'll need a frontend to create a room and connect—we'll set that up next.
Deploying to LiveKit Cloud
Once you're happy with your agent, deploy it to LiveKit Cloud so it runs in their infrastructure:
lk agent deploy --secrets-file=.env.localThis uploads your agent code and environment variables to LiveKit Cloud. The agent will now automatically join any rooms created in your project.
Setting up the frontend
For the frontend, we'll use LiveKit's React agent starter. This provides a complete UI for connecting to rooms and displaying agent video/audio.
In a new terminal, create the frontend:
lk app create --template agent-starter-react
cd agent-starter-react
pnpm installCreate a .env.local file with your LiveKit credentials:
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secretStart the development server:
pnpm devOpen http://localhost:3000 in your browser. Click the connect button to join a room. LiveKit Cloud will automatically dispatch your agent to join the same room, and you'll see the Anam avatar appear as the agent speaks.
How it all fits together
Here's what happens when a user connects:
- The React frontend requests a room token from its API route
- The frontend connects to the LiveKit room using that token
- LiveKit Cloud sees a new participant and dispatches your agent
- The agent joins the room and starts the OpenAI Realtime voice session
- The Anam avatar session starts, publishing video of the avatar to the room
- The frontend receives and displays both the agent's audio and the avatar video
- When the user speaks, their audio goes to the agent, which processes it through OpenAI and responds
- The Anam plugin synchronizes the avatar's lip movements and expressions with the audio output.
Try customizing the assistant's instructions in src/agent.ts to change its personality, or swap in a different avatar ID from lab.anam.ai to change its appearance.