Anam AI avatars with VideoSDK agents

VideoSDK's AI agent framework lets you build voice assistants that answer questions, call function tools, and handle real-time conversation. The Anam AI Avatar plugin gives those agents a face: a lip-synced avatar that moves with the speech.

You can add Anam avatars to RealTimePipeline (low-latency, native audio like Gemini Live) or CascadingPipeline (modular STT -> LLM -> TTS). Either way, a few lines of config.

The complete code is at examples/videosdk-anam-avatar.

What you'll build

A VideoSDK voice agent with an Anam avatar that:

Speaks with lip-synced facial animation
Works with either RealTimePipeline (e.g. Gemini Live) or CascadingPipeline (e.g. STT + LLM + TTS, in this case Deepgram + OpenAI + ElevenLabs)
Supports function tools (e.g. weather lookup)
Greets the user on join and says goodbye on exit

Prerequisites

Python 3.12+
uv for project management
A VideoSDK auth token from videosdk.live (required to join rooms and stream)
An Anam API key from lab.anam.ai
An avatar ID from lab.anam.ai/avatars
For RealTimePipeline: A Google AI API key (Gemini)
For CascadingPipeline: Deepgram, OpenAI, and ElevenLabs API keys

Project setup

If you want to follow along with the cookbook example, set up the project first:

git clone https://github.com/anam-org/anam-cookbook.git
cd anam-cookbook
cd examples/videosdk-anam-avatar
uv sync
cp .env.example .env

Edit .env with your credentials:

# VideoSDK (required to join rooms)
VIDEOSDK_AUTH_TOKEN=your_videosdk_auth_token

# Anam (required for avatar)
ANAM_API_KEY=your_anam_api_key
ANAM_AVATAR_ID=your_avatar_id

# RealTimePipeline (Gemini)
GOOGLE_API_KEY=your_google_api_key

# CascadingPipeline (STT, LLM, TTS)
DEEPGRAM_API_KEY=your_deepgram_key
OPENAI_API_KEY=your_openai_key
ELEVENLABS_API_KEY=your_elevenlabs_key

The agent reads VIDEOSDK_AUTH_TOKEN from the environment to authenticate with VideoSDK when joining rooms.

Never expose your API keys in client-side code. The VideoSDK agent runs server-side. Use environment variables or a secrets manager.

Installation

If you're wiring this into an existing project, the package to install is:

uv add "videosdk-plugins-anam"

If you're starting a new project the installation is done as part of the project setup above.

Adding the Anam avatar

The plugin exposes AnamAvatar to create a new avatar instance. Use your API key and an Anam avatar ID, then pass it to the pipeline's avatar parameter:

import os
from videosdk.plugins.anam import AnamAvatar

anam_avatar = AnamAvatar(
    api_key=os.getenv("ANAM_API_KEY"),
    avatar_id=os.getenv("ANAM_AVATAR_ID"),
)

The avatar ID is the unique identifier for the avatar you want to use. You can browse and create avatars at lab.anam.ai/avatars. The avatar returns a synchronised audio and video stream of the avatar speaking.

CascadingPipeline (STT -> LLM -> TTS)

In the CascadingPipeline, all components are sequenced after one other. You can plug in your own STT, LLM, and TTS. Add the avatar as part of the CascadingPipeline

from videosdk.agents import Agent, AgentSession, CascadingPipeline, ConversationFlow, JobContext, RoomOptions, WorkerJob
from videosdk.plugins.anam import AnamAvatar
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.elevenlabs import ElevenLabsTTS
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector, pre_download_model

pre_download_model()

stt = DeepgramSTT(model="nova-3", language="multi", api_key=os.getenv("DEEPGRAM_API_KEY"))
llm = OpenAILLM(model="gpt-4o-mini", api_key=os.getenv("OPENAI_API_KEY"))
tts = ElevenLabsTTS(api_key=os.getenv("ELEVENLABS_API_KEY"), enable_streaming=True)
vad = SileroVAD()
turn_detector = TurnDetector(threshold=0.8)

anam_avatar = AnamAvatar(
    api_key=os.getenv("ANAM_API_KEY"),
    avatar_id=os.getenv("ANAM_AVATAR_ID"),
)

pipeline = CascadingPipeline(
    stt=stt,
    llm=llm,
    tts=tts,
    vad=vad,
    turn_detector=turn_detector,
    avatar=anam_avatar,
)

In this example, the TTS output goes to the avatar, which returns a lip-synced video stream (synchronized audio/video) that is published in your VideoSDK room. Full example: Anam Cascading Example on GitHub.

RealTimePipeline (Gemini Live)

RealTimePipeline uses native audio models like Gemini Live. Add the Anam avatar alongside your model and the audio is forwarded direclty to Anam to render the avatar:

from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from videosdk.plugins.anam import AnamAvatar

model = GeminiRealtime(
    model="gemini-2.5-flash-native-audio-preview-12-2025",
    config=GeminiLiveConfig(
        voice="Leda",
        response_modalities=["AUDIO"],
    ),
)

anam_avatar = AnamAvatar(
    api_key=os.getenv("ANAM_API_KEY"),
    avatar_id=os.getenv("ANAM_AVATAR_ID"),
)

pipeline = RealTimePipeline(model=model, avatar=anam_avatar)

The model's audio drives the avatar; the avatar video streams to participants. Full example: Anam Realtime Example on GitHub.

RealTimePipeline with tool calling

Let's expand the RealTimePipeline to include a tool calling example.

We'll add a basic functionality for the agent to get the weather for a given location. As the toolcall might take some time to complete, we need to avoid that the user interrupts the toolcall wondering if the session is still active.

Therefore, we'll provide the user with immediate feedback, to indicate that the agent is checking the weather, and the result of the toolcall will be added afterwards. We can achieve this by tweaking the LLM prompt (generate a short response first):

class AnamVoiceAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful AI avatar assistant powered by VideoSDK and Anam. "
            "You have a visual avatar that speaks with you. Answer questions about weather and other tasks. "
            "You know how to provide real time weather information."
            "When the user asks about the weather, generate a short response first to indicate you are checking the weather."
            "Consider your initial response when providing the weather information afterwards."
            "Keep responses concise and conversational.",
            tools=[get_weather],
        )

The toolcall can be any business logic that applies to your product, but for this demo, we'll use a simple weather API.

@function_tool
async def get_weather(location: str):
    """Called when the user asks about the weather. Returns the weather for the given location.

    Args:
        location: The location to get the weather for
    """
    geocode_url = (
            "https://geocoding-api.open-meteo.com/v1/search"
            f"?name={location}&count=1&language=en&format=json"
        )
    async with aiohttp.ClientSession() as session:
        async with session.get(geocode_url) as response:
            data = await response.json()
        results = data.get("results") or []
        if not results:
            raise Exception(f"Could not find coordinates for {location}")
        lat = results[0]["latitude"]
        lon = results[0]["longitude"]
        resolved_name = results[0]["name"]
        forecast_url = (
            "https://api.open-meteo.com/v1/forecast"
            f"?latitude={lat}&longitude={lon}&current=temperature_2m&timezone=auto"
        )
        async with session.get(forecast_url) as response:
            weather = await response.json()
    
    return {
        "location": resolved_name,
        "temperature": weather["current"]["temperature_2m"],
        "temperature_unit": "Celsius",
    }

The toolcall response pre-amble is more relevant when toolcalls require non-trivial amounts of time (e.g. fetching data from a database). To simulate this, add a sleep inside the toolcall to delay the response:

await asyncio.sleep(10)

You should see the avatar responding immediately, while being idle during the toolcall, and seamlessly continues the conversation with the response later.

Running the agent

The example includes both pipeline types. Run the RealTime (Gemini) agent:

uv run python realtime_agent.py

Or run the Cascading agent:

uv run python cascading_agent.py

When you run the agent, a playground URL is printed in the terminal. Open it in your browser to join the room and see the avatar. The agent auto-creates a room when room_id is omitted in RoomOptions. To use a specific room, create it via the Create Room API and pass the room_id.

Use cases

Typical fits: customer support agents that guide users through flows, sales demos and product FAQs, training and onboarding, accessibility (lip-synced video for users who prefer visual feedback), learning platforms, and meeting companions.

Docs: VideoSDK AI Agents, Anam.

Terminology

Avatar - The visual character (face, expressions, lip sync)
Persona - In Anam's ecosystem, the full AI character (avatar + voice + LLM + system prompt)
RealTimePipeline - Single-model pipeline with native audio (e.g. Gemini Live)
CascadingPipeline - Modular pipeline with separate STT, LLM, and TTS components

The plugin handles avatar rendering. You write the agent logic; the avatar follows.