January 19, 2025
ElevenLabs Conversational AI with Anam avatars
ElevenLabs Conversational AI gives you voice agents with natural speech recognition and synthesis. But voice-only agents can feel disembodied. By adding an Anam avatar, you give your agent a face that moves in sync with its speech.
This integration uses Anam's audio passthrough mode. Instead of using Anam's built-in STT/LLM/TTS pipeline, you send audio directly from ElevenLabs to the avatar for lip-syncing:
User voice → ElevenLabs agent → Audio response → Anam avatar → User sees talking avatarElevenLabs handles the conversation intelligence and voice synthesis. Anam renders the visual avatar synchronized to the audio.
The complete code is at anam-org/elevenlabs_agent_demo.
What you'll build
A web application that:
- Connects to an ElevenLabs Conversational AI agent via WebSocket
- Displays an Anam avatar that lip-syncs to the agent's responses
- Captures microphone input and sends it to ElevenLabs
- Handles interruptions when the user speaks over the agent
Prerequisites
- Bun runtime (or Node.js 18+)
- An ElevenLabs account with a Conversational AI agent configured
- An Anam account with API key from lab.anam.ai
Project setup
Clone the demo repository:
git clone https://github.com/anam-org/elevenlabs_agent_demo.git
cd elevenlabs_agent_demo
bun installCreate a .dev.vars file with your credentials:
ANAM_API_KEY=your_anam_api_key
ANAM_AVATAR_ID=your_avatar_id
ELEVENLABS_AGENT_ID=your_agent_idYou can find avatar IDs at lab.anam.ai/avatars. For the ElevenLabs agent ID, go to your agent in the ElevenLabs dashboard and copy the ID from the URL or settings.
Start the development server:
bun run devOpen http://localhost:5173 and click "Start Conversation" to try it out.
Project structure
The demo has a simple structure:
src/
├── client.ts # Main orchestration logic
├── elevenlabs.ts # WebSocket and microphone handling
├── index.ts # Hono server entry point
├── renderer.tsx # HTML template
└── routes/
├── index.tsx # UI page
└── api/config.ts # Session token endpointThe server creates Anam session tokens so API keys stay server-side. The client orchestrates the connection between ElevenLabs and Anam.
Server-side: creating session tokens
The /api/config endpoint creates an Anam session token with audio passthrough enabled:
// src/routes/api/config.ts
const response = await fetch("https://api.anam.ai/v1/auth/session-token", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${env.ANAM_API_KEY}`,
},
body: JSON.stringify({
avatarId: env.ANAM_AVATAR_ID,
enableAudioPassthrough: true,
}),
});
const { sessionToken } = await response.json();
return c.json({
anamSessionToken: sessionToken,
elevenLabsAgentId: env.ELEVENLABS_AGENT_ID,
});The enableAudioPassthrough: true flag is required. Without it, the avatar won't accept external audio input.
Client-side: the main orchestration
The client connects both services and wires them together. Let's walk through src/client.ts.
Initializing the Anam client
When the user clicks "Start Conversation", we fetch the config and create the Anam client:
import { createClient } from "@anam-ai/js-sdk";
import { connectElevenLabs, stopElevenLabs } from "./elevenlabs";
async function start() {
// Fetch config from server
const res = await fetch("/api/config");
const config = await res.json();
// Initialize Anam avatar
anamClient = createClient(config.anamSessionToken, {
disableInputAudio: true, // ElevenLabs handles the microphone
});
await anamClient.streamToVideoElement("anam-video");The disableInputAudio: true option tells Anam not to capture microphone input. ElevenLabs handles speech recognition, so we don't want Anam listening too.
Setting up the audio stream
Next, we create the audio input stream that will receive ElevenLabs audio:
const agentAudioInputStream = anamClient.createAgentAudioInputStream({
encoding: "pcm_s16le",
sampleRate: 16000,
channels: 1,
});The audio format must match what ElevenLabs sends: PCM 16-bit signed little-endian at 16kHz mono. If these don't match, the lip-sync will be wrong or won't work at all.
Connecting to ElevenLabs
Now we connect to ElevenLabs and wire up the callbacks:
await connectElevenLabs(config.elevenLabsAgentId, {
onReady: () => {
setConnected(true);
addMessage("system", "Connected. Start speaking...");
},
onAudio: (audio) => {
agentAudioInputStream.sendAudioChunk(audio);
},
onUserTranscript: (text) => addMessage("user", text),
onAgentResponse: (text) => {
agentAudioInputStream.endSequence();
addMessage("agent", text);
},
onInterrupt: () => {
anamClient?.interruptPersona();
agentAudioInputStream.endSequence();
},
onDisconnect: () => setConnected(false),
onError: () => showError("Connection error"),
});
}Each callback handles a different event:
- onAudio receives base64-encoded audio chunks from ElevenLabs and forwards them to Anam for lip-sync
- onAgentResponse fires when the agent finishes speaking, so we call
endSequence()to signal completion - onInterrupt fires when the user speaks over the agent (barge-in), so we stop the avatar mid-speech
The ElevenLabs module
The src/elevenlabs.ts module handles the WebSocket connection and microphone capture. Let's look at the key parts.
Connecting to the WebSocket
ElevenLabs Conversational AI uses a WebSocket for bidirectional audio streaming:
import { MicrophoneCapture } from "chatdio";
let ws: WebSocket | null = null;
let microphone: MicrophoneCapture | null = null;
export async function connectElevenLabs(
agentId: string,
callbacks: ElevenLabsCallbacks
) {
ws = new WebSocket(
`wss://api.elevenlabs.io/v1/convai/conversation?agent_id=${agentId}`
);
ws.onopen = async () => {
// Start microphone capture once connected
microphone = new MicrophoneCapture({
echoCancellation: true,
noiseSuppression: true,
autoGainControl: true,
sampleRate: 16000,
});
microphone.on("audio", (audioData: ArrayBuffer) => {
if (ws?.readyState === WebSocket.OPEN) {
const base64 = btoa(String.fromCharCode(...new Uint8Array(audioData)));
ws.send(JSON.stringify({ user_audio_chunk: base64 }));
}
});
await microphone.start();
};The chatdio library provides MicrophoneCapture with echo cancellation built in. This prevents the avatar's audio from feeding back into the microphone.
Handling messages
The WebSocket receives different message types from ElevenLabs:
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
switch (message.type) {
case "conversation_initiation_metadata":
callbacks.onReady();
break;
case "audio":
// Forward audio to Anam for lip-sync
callbacks.onAudio(message.audio_event.audio_base_64);
break;
case "agent_response":
callbacks.onAgentResponse(message.agent_response_event.agent_response);
break;
case "user_transcript":
callbacks.onUserTranscript(message.user_transcription_event.user_transcript);
break;
case "interruption":
callbacks.onInterrupt();
break;
case "ping":
ws?.send(JSON.stringify({ type: "pong" }));
break;
}
};The audio messages contain base64-encoded PCM chunks. We pass these directly to Anam via the onAudio callback.
Cleaning up
When the conversation ends, we close everything:
export function stopElevenLabs() {
microphone?.stop();
microphone = null;
if (ws?.readyState === WebSocket.OPEN) {
ws.close();
}
ws = null;
}Handling interruptions
When users speak while the agent is talking (barge-in), ElevenLabs sends an interruption event. The client handles this by stopping the avatar immediately:
onInterrupt: () => {
anamClient?.interruptPersona();
agentAudioInputStream.endSequence();
},The interruptPersona() method stops any queued audio and resets the avatar to its idle state. Without this, the avatar would continue lip-syncing to audio that's no longer playing.
Audio format requirements
ElevenLabs outputs PCM 16-bit signed little-endian audio at 16kHz mono. The Anam audio stream must be configured to match:
anamClient.createAgentAudioInputStream({
encoding: "pcm_s16le", // PCM 16-bit signed little-endian
sampleRate: 16000, // 16kHz
channels: 1, // Mono
});If you're adapting this for a different voice provider, check their audio output format and adjust accordingly.
Deploying to Cloudflare Workers
The demo is set up for Cloudflare Workers deployment:
bun run deployBefore deploying, set your environment variables in Cloudflare:
wrangler secret put ANAM_API_KEY
wrangler secret put ANAM_AVATAR_ID
wrangler secret put ELEVENLABS_AGENT_IDTroubleshooting
Avatar lips not syncing:
- Verify the audio format matches:
pcm_s16le, 16kHz, mono - Check that
enableAudioPassthrough: truewas set when creating the session token - Make sure
createAgentAudioInputStream()is called before sending audio
No audio from ElevenLabs:
- Verify your ElevenLabs agent ID is correct
- Check that the WebSocket is connected before sending microphone data
- Confirm your ElevenLabs account has Conversational AI access
Echo or feedback:
- The
chatdiolibrary should handle echo cancellation automatically - Make sure you're using
disableInputAudio: trueon the Anam client so it doesn't also capture the microphone