Build real-time AI virtual avatars with Anam and VideoSDK AI Voice Agents
VideoSDK’s AI agent framework lets you build voice assistants that answer questions, call function tools, and handle real-time conversation. The Anam AI Avatar plugin gives those agents a face: a lip-synced avatar that moves with the speech.You can add Anam avatars to RealTimePipeline (low-latency, native audio like Gemini Live) or CascadingPipeline (modular STT -> LLM -> TTS). Either way, a few lines of config.The complete code is at examples/videosdk-anam-avatar.
The plugin exposes AnamAvatar to create a new avatar instance. Use your API key and an Anam avatar ID, then pass it to the pipeline’s avatar parameter:
The avatar ID is the unique identifier for the avatar you want to use. You can browse and create avatars at lab.anam.ai/avatars. The avatar returns a synchronised audio and video stream of the avatar speaking.
In the CascadingPipeline, all components are sequenced after one other. You can plug in your own STT, LLM, and TTS. Add the avatar as part of the CascadingPipeline:
In this example, the TTS output goes to the avatar, which returns a lip-synced video stream (synchronized audio/video) that is published in your VideoSDK room. Full example: Anam Cascading Example on GitHub.
RealTimePipeline uses native audio models like Gemini Live. Add the Anam avatar alongside your model and the audio is forwarded directly to Anam to render the avatar:
Let’s expand the RealTimePipeline to include a tool calling example.We’ll add a basic functionality for the agent to get the weather for a given location. As the toolcall might take some time to complete, we need to avoid that the user interrupts the toolcall wondering if the session is still active.Therefore, we’ll provide the user with immediate feedback, to indicate that the agent is checking the weather, and the result of the toolcall will be added afterwards. We can achieve this by tweaking the LLM prompt (generate a short response first):
class AnamVoiceAgent(Agent): def __init__(self): super().__init__( instructions="You are a helpful AI avatar assistant powered by VideoSDK and Anam. " "You have a visual avatar that speaks with you. Answer questions about weather and other tasks. " "You know how to provide real time weather information." "When the user asks about the weather, generate a short response first to indicate you are checking the weather." "Consider your initial response when providing the weather information afterwards." "Keep responses concise and conversational.", tools=[get_weather], )
The toolcall can be any business logic that applies to your product, but for this demo, we’ll use a simple weather API.
@function_toolasync def get_weather(location: str): """Called when the user asks about the weather. Returns the weather for the given location. Args: location: The location to get the weather for """ geocode_url = ( "https://geocoding-api.open-meteo.com/v1/search" f"?name={location}&count=1&language=en&format=json" ) async with aiohttp.ClientSession() as session: async with session.get(geocode_url) as response: data = await response.json() results = data.get("results") or [] if not results: raise Exception(f"Could not find coordinates for {location}") lat = results[0]["latitude"] lon = results[0]["longitude"] resolved_name = results[0]["name"] forecast_url = ( "https://api.open-meteo.com/v1/forecast" f"?latitude={lat}&longitude={lon}¤t=temperature_2m&timezone=auto" ) async with session.get(forecast_url) as response: weather = await response.json() return { "location": resolved_name, "temperature": weather["current"]["temperature_2m"], "temperature_unit": "Celsius", }
The toolcall response pre-amble is more relevant when toolcalls require non-trivial amounts of time (e.g. fetching data from a database). To simulate this, add a sleep inside the toolcall to delay the response:
await asyncio.sleep(10)
You should see the avatar responding immediately, while being idle during the toolcall, and seamlessly continues the conversation with the response later.
The example includes both pipeline types. Run the RealTime (Gemini) agent:
uv run python realtime_agent.py
Or run the Cascading agent:
uv run python cascading_agent.py
When you run the agent, a playground URL is printed in the terminal. Open it in your browser to join the room and see the avatar. The agent auto-creates a room when room_id is omitted in RoomOptions. To use a specific room, create it via the Create Room API and pass the room_id.
Typical fits: customer support agents that guide users through flows, sales demos and product FAQs, training and onboarding, accessibility (lip-synced video for users who prefer visual feedback), learning platforms, and meeting companions.Docs: VideoSDK AI Agents, Anam.
import aiohttpimport osfrom videosdk.agents import Agent, AgentSession, RealTimePipeline, function_tool, JobContext, RoomOptions, WorkerJobfrom videosdk.plugins.google import GeminiRealtime, GeminiLiveConfigfrom videosdk.plugins.anam import AnamAvatarimport logginglogging.basicConfig( level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", handlers=[logging.StreamHandler()])@function_toolasync def get_weather( latitude: str, longitude: str,): """Called when the user asks about the weather. This function will return the weather for the given location. When given a location, please estimate the latitude and longitude of the location and do not ask the user for them. Args: latitude: The latitude of the location longitude: The longitude of the location """ print("###Getting weather for", latitude, longitude) url = f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}¤t=temperature_2m" weather_data = {} async with aiohttp.ClientSession() as session: async with session.get(url) as response: if response.status == 200: data = await response.json() print("###Weather data", data) weather_data = { "temperature": data["current"]["temperature_2m"], "temperature_unit": "Celsius", } else: raise Exception( f"Failed to get weather data, status code: {response.status}" ) return weather_dataclass MyVoiceAgent(Agent): def __init__(self): super().__init__( instructions="You are VideoSDK's AI Avatar Voice Agent with real-time capabilities. You are a helpful virtual assistant with a visual avatar that can answer questions about weather and help with other tasks in real-time.", tools=[get_weather] ) async def on_enter(self) -> None: await self.session.say("Hello! I'm your real-time AI avatar assistant powered by VideoSDK. How can I help you today?") async def on_exit(self) -> None: await self.session.say("Goodbye! It was great talking with you!")async def start_session(context: JobContext): # Initialize Gemini Realtime model model = GeminiRealtime( model="gemini-2.5-flash-native-audio-preview-12-2025", # When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter # api_key="AIXXXXXXXXXXXXXXXXXXXX", config=GeminiLiveConfig( voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr. response_modalities=["AUDIO"] ) ) # Initialize Anam Avatar anam_avatar = AnamAvatar( api_key=os.getenv("ANAM_API_KEY"), avatar_id=os.getenv("ANAM_AVATAR_ID"), ) # Create pipeline with avatar pipeline = RealTimePipeline(model=model, avatar=anam_avatar) session = AgentSession(agent=MyVoiceAgent(), pipeline=pipeline) await session.start(wait_for_participant=True, run_until_shutdown=True)def make_context() -> JobContext: room_options = RoomOptions( room_id="<room_id>", name="Anam Avatar Realtime Agent", playground=False ) return JobContext(room_options=room_options)if __name__ == "__main__": job = WorkerJob(entrypoint=start_session, jobctx=make_context) job.start()