VideoSDK

VideoSDK’s AI agent framework lets you build voice assistants that answer questions, call function tools, and handle real-time conversation. The Anam AI Avatar plugin gives those agents a face: a lip-synced avatar that moves with the speech. You can add Anam avatars to RealTimePipeline (low-latency, native audio like Gemini Live) or CascadingPipeline (modular STT -> LLM -> TTS). Either way, a few lines of config. The complete code is at examples/videosdk-anam-avatar.

What you’ll build

A VideoSDK voice agent with an Anam avatar that:

Speaks with lip-synced facial animation
Works with either RealTimePipeline (e.g. Gemini Live or openAI Real Time) or CascadingPipeline (STT + LLM + TTS, e.g. Deepgram + OpenAI + ElevenLabs)
Supports function tools (e.g. weather lookup)
Greets the user on join and says goodbye on exit

Prerequisites

Python 3.12+
uv for project management
A VideoSDK auth token from videosdk.live (required to join rooms and stream)
An Anam API key from lab.anam.ai
An avatar ID from lab.anam.ai/avatars
For RealTimePipeline: A Google AI API key (Gemini)
For CascadingPipeline: Deepgram, OpenAI, and ElevenLabs API keys

Project setup

If you want to follow along with the cookbook example, set up the project first:

git clone https://github.com/anam-org/anam-cookbook.git
cd anam-cookbook
cd examples/videosdk-anam-avatar
uv sync
cp .env.example .env

Edit .env with your credentials:

# VideoSDK (required to join rooms)
VIDEOSDK_AUTH_TOKEN=your_videosdk_auth_token

# Anam (required for avatar)
ANAM_API_KEY=your_anam_api_key
ANAM_AVATAR_ID=your_avatar_id

# RealTimePipeline (Gemini)
GOOGLE_API_KEY=your_google_api_key

# CascadingPipeline (STT, LLM, TTS)
DEEPGRAM_API_KEY=your_deepgram_key
OPENAI_API_KEY=your_openai_key
ELEVENLABS_API_KEY=your_elevenlabs_key

The agent reads VIDEOSDK_AUTH_TOKEN from the environment to authenticate with VideoSDK when joining rooms.

Never expose your API keys in client-side code. The VideoSDK agent runs server-side. Use environment variables or a secrets manager.

Installation

If you’re wiring this into an existing project, the package to install is:

uv add "videosdk-plugins-anam"

If you’re starting a new project the installation is done as part of the project setup above.

Cookbook: Anam AI avatars with VideoSDK agents

You can add Anam avatars to RealTimePipeline or CascadingPipeline

Example

Full source code for the videosdk and Anam integration

videoSDK Docs

Official videoSDK documentation for Anam AI Avatar plugin

Adding the Anam avatar

The plugin exposes AnamAvatar to create a new avatar instance. Use your API key and an Anam avatar ID, then pass it to the pipeline’s avatar parameter:

import os
from videosdk.plugins.anam import AnamAvatar

anam_avatar = AnamAvatar(
    api_key=os.getenv("ANAM_API_KEY"),
    avatar_id=os.getenv("ANAM_AVATAR_ID"),
)

The avatar ID is the unique identifier for the avatar you want to use. You can browse and create avatars at lab.anam.ai/avatars. The avatar returns a synchronised audio and video stream of the avatar speaking.

CascadingPipeline (STT -> LLM -> TTS)

In the CascadingPipeline, all components are sequenced after one other. You can plug in your own STT, LLM, and TTS. Add the avatar as part of the CascadingPipeline:

from videosdk.agents import Agent, AgentSession, CascadingPipeline, ConversationFlow, JobContext, RoomOptions, WorkerJob
from videosdk.plugins.anam import AnamAvatar
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.elevenlabs import ElevenLabsTTS
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector, pre_download_model

pre_download_model()

stt = DeepgramSTT(model="nova-3", language="multi", api_key=os.getenv("DEEPGRAM_API_KEY"))
llm = OpenAILLM(model="gpt-4o-mini", api_key=os.getenv("OPENAI_API_KEY"))
tts = ElevenLabsTTS(api_key=os.getenv("ELEVENLABS_API_KEY"), enable_streaming=True)
vad = SileroVAD()
turn_detector = TurnDetector(threshold=0.8)

anam_avatar = AnamAvatar(
    api_key=os.getenv("ANAM_API_KEY"),
    avatar_id=os.getenv("ANAM_AVATAR_ID"),
)

pipeline = CascadingPipeline(
    stt=stt,
    llm=llm,
    tts=tts,
    vad=vad,
    turn_detector=turn_detector,
    avatar=anam_avatar,
)

In this example, the TTS output goes to the avatar, which returns a lip-synced video stream (synchronized audio/video) that is published in your VideoSDK room. Full example: Anam Cascading Example on GitHub.

RealTimePipeline (Gemini Live)

RealTimePipeline uses native audio models like Gemini Live. Add the Anam avatar alongside your model and the audio is forwarded directly to Anam to render the avatar:

from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from videosdk.plugins.anam import AnamAvatar

model = GeminiRealtime(
    model="gemini-2.5-flash-native-audio-preview-12-2025",
    config=GeminiLiveConfig(
        voice="Leda",
        response_modalities=["AUDIO"],
    ),
)

anam_avatar = AnamAvatar(
    api_key=os.getenv("ANAM_API_KEY"),
    avatar_id=os.getenv("ANAM_AVATAR_ID"),
)

pipeline = RealTimePipeline(model=model, avatar=anam_avatar)

The model’s audio drives the avatar; the avatar video streams to participants. Full example: Anam Realtime Example on GitHub.

RealTimePipeline with tool calling

Let’s expand the RealTimePipeline to include a tool calling example. We’ll add a basic functionality for the agent to get the weather for a given location. As the toolcall might take some time to complete, we need to avoid that the user interrupts the toolcall wondering if the session is still active. Therefore, we’ll provide the user with immediate feedback, to indicate that the agent is checking the weather, and the result of the toolcall will be added afterwards. We can achieve this by tweaking the LLM prompt (generate a short response first):

class AnamVoiceAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful AI avatar assistant powered by VideoSDK and Anam. "
            "You have a visual avatar that speaks with you. Answer questions about weather and other tasks. "
            "You know how to provide real time weather information."
            "When the user asks about the weather, generate a short response first to indicate you are checking the weather."
            "Consider your initial response when providing the weather information afterwards."
            "Keep responses concise and conversational.",
            tools=[get_weather],
        )

The toolcall can be any business logic that applies to your product, but for this demo, we’ll use a simple weather API.

@function_tool
async def get_weather(location: str):
    """Called when the user asks about the weather. Returns the weather for the given location.

    Args:
        location: The location to get the weather for
    """
    geocode_url = (
            "https://geocoding-api.open-meteo.com/v1/search"
            f"?name={location}&count=1&language=en&format=json"
        )
    async with aiohttp.ClientSession() as session:
        async with session.get(geocode_url) as response:
            data = await response.json()
        results = data.get("results") or []
        if not results:
            raise Exception(f"Could not find coordinates for {location}")
        lat = results[0]["latitude"]
        lon = results[0]["longitude"]
        resolved_name = results[0]["name"]
        forecast_url = (
            "https://api.open-meteo.com/v1/forecast"
            f"?latitude={lat}&longitude={lon}&current=temperature_2m&timezone=auto"
        )
        async with session.get(forecast_url) as response:
            weather = await response.json()
    
    return {
        "location": resolved_name,
        "temperature": weather["current"]["temperature_2m"],
        "temperature_unit": "Celsius",
    }

The toolcall response pre-amble is more relevant when toolcalls require non-trivial amounts of time (e.g. fetching data from a database). To simulate this, add a sleep inside the toolcall to delay the response:

await asyncio.sleep(10)

You should see the avatar responding immediately, while being idle during the toolcall, and seamlessly continues the conversation with the response later.

Running the agent

The example includes both pipeline types. Run the RealTime (Gemini) agent:

uv run python realtime_agent.py

Or run the Cascading agent:

uv run python cascading_agent.py

When you run the agent, a playground URL is printed in the terminal. Open it in your browser to join the room and see the avatar. The agent auto-creates a room when room_id is omitted in RoomOptions. To use a specific room, create it via the Create Room API and pass the room_id.

Use cases

Typical fits: customer support agents that guide users through flows, sales demos and product FAQs, training and onboarding, accessibility (lip-synced video for users who prefer visual feedback), learning platforms, and meeting companions. Docs: VideoSDK AI Agents, Anam.

Terminology

Avatar - The visual character (face, expressions, lip sync)
Persona - In Anam’s ecosystem, the full AI character (avatar + voice + LLM + system prompt)
RealTimePipeline - Single-model pipeline with native audio (e.g. Gemini Live)
CascadingPipeline - Modular pipeline with separate STT, LLM, and TTS components

The plugin handles avatar rendering. You write the agent logic; the avatar follows.

Full example code

Step 1: Create a main.py file

main.py

import aiohttp
import os

from videosdk.agents import Agent, AgentSession, RealTimePipeline, function_tool, JobContext, RoomOptions, WorkerJob
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from videosdk.plugins.anam import AnamAvatar
import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    handlers=[logging.StreamHandler()]
)

@function_tool
async def get_weather(
    latitude: str,
    longitude: str,
):
    """Called when the user asks about the weather. This function will return the weather for
    the given location. When given a location, please estimate the latitude and longitude of the
    location and do not ask the user for them.

    Args:
        latitude: The latitude of the location
        longitude: The longitude of the location
    """
    print("###Getting weather for", latitude, longitude)
    url = f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&current=temperature_2m"
    weather_data = {}
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            if response.status == 200:
                data = await response.json()
                print("###Weather data", data)
                weather_data = {
                    "temperature": data["current"]["temperature_2m"],
                    "temperature_unit": "Celsius",
                }
            else:
                raise Exception(
                    f"Failed to get weather data, status code: {response.status}"
                )

    return weather_data

class MyVoiceAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are VideoSDK's AI Avatar Voice Agent with real-time capabilities. You are a helpful virtual assistant with a visual avatar that can answer questions about weather and help with other tasks in real-time.",
            tools=[get_weather]
        )

    async def on_enter(self) -> None:
        await self.session.say("Hello! I'm your real-time AI avatar assistant powered by VideoSDK. How can I help you today?")
    
    async def on_exit(self) -> None:
        await self.session.say("Goodbye! It was great talking with you!")
        

async def start_session(context: JobContext):
    # Initialize Gemini Realtime model
    model = GeminiRealtime(
        model="gemini-2.5-flash-native-audio-preview-12-2025",
        # When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
        # api_key="AIXXXXXXXXXXXXXXXXXXXX", 
        config=GeminiLiveConfig(
            voice="Leda",  # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
            response_modalities=["AUDIO"]
        )
    )

    # Initialize Anam Avatar
    anam_avatar = AnamAvatar(
        api_key=os.getenv("ANAM_API_KEY"),
        avatar_id=os.getenv("ANAM_AVATAR_ID"),
    )

    # Create pipeline with avatar
    pipeline = RealTimePipeline(model=model, avatar=anam_avatar)

    session = AgentSession(agent=MyVoiceAgent(), pipeline=pipeline)

    await session.start(wait_for_participant=True, run_until_shutdown=True)

def make_context() -> JobContext:
    room_options = RoomOptions(
        room_id="<room_id>",
        name="Anam Avatar Realtime Agent",
        playground=False 
    )

    return JobContext(room_options=room_options)

if __name__ == "__main__":
    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
    job.start()

Step 2: Run the agent

python main.py

Getting Started

Core Concepts

Anam Widget

Tools

Knowledge

Examples

SDK Reference

Resources

Third-Party Integrations

Community

What you’ll build

Prerequisites

Project setup

Installation

Cookbook: Anam AI avatars with VideoSDK agents

Example

videoSDK Docs

Adding the Anam avatar

CascadingPipeline (STT -> LLM -> TTS)

RealTimePipeline (Gemini Live)

RealTimePipeline with tool calling

Running the agent

Use cases

Terminology

Full example code

Step 1: Create a main.py file

Step 2: Run the agent

Getting Started

Core Concepts

Anam Widget

Tools

Knowledge

Examples

SDK Reference

Resources

Third-Party Integrations

Community

​What you’ll build

​Prerequisites

​Project setup

​Installation

Cookbook: Anam AI avatars with VideoSDK agents

Example

videoSDK Docs

​Adding the Anam avatar

​CascadingPipeline (STT -> LLM -> TTS)

​RealTimePipeline (Gemini Live)

​RealTimePipeline with tool calling

​Running the agent

​Use cases

​Terminology

​Full example code

​Step 1: Create a main.py file

​Step 2: Run the agent

What you’ll build

Prerequisites

Project setup

Installation

Adding the Anam avatar

CascadingPipeline (STT -> LLM -> TTS)

RealTimePipeline (Gemini Live)

RealTimePipeline with tool calling

Running the agent

Use cases

Terminology

Full example code

Step 1: Create a main.py file

Step 2: Run the agent