LiveKit Configuration

Installation

pip install livekit-plugins-anam

Environment variables

Service	Where to get it
Anam	lab.anam.ai
LiveKit	LiveKit Cloud or self-hosted
LLM providers	DeepGram, ElevenLabs, OpenAI, Google AI Studio, etc.

.env

ANAM_API_KEY=your_anam_api_key
ANAM_AVATAR_ID=your_avatar_id

LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret

OPENAI_API_KEY=your_openai_api_key
# or
GEMINI_API_KEY=your_gemini_api_key

PersonaConfig

Configure the avatar identity and model:

persona_config = anam.PersonaConfig(
    name="Maya",             # Display name for the avatar
    avatarId="uuid-here",    # Avatar appearance ID
    avatarModel="cara-4",    # Avatar model version
)

name

string

required

Display name for the avatar. Used in logs and debugging.

avatarId

string

required

UUID of the avatar to use. Get this from the Avatar Gallery or Anam Lab.

avatarModel

string

Avatar model version to use when rendering the avatar. Use cara-4 for new production avatars.

AvatarSession

avatar = anam.AvatarSession(
    persona_config=anam.PersonaConfig(...),
    api_key="your_api_key",
    api_url="https://api.anam.ai",  # Optional
)

persona_config

PersonaConfig

required

Configuration for the avatar’s identity and appearance.

api_key

string

required

Your Anam API key.

api_url

string

default:"https://api.anam.ai"

Anam API endpoint. Override for staging or self-hosted deployments.

start()

Starts the avatar session and connects it to the LiveKit room.

await avatar.start(session, room=ctx.room)

session

AgentSession

required

The LiveKit agent session to connect the avatar to.

room

rtc.Room

required

The LiveKit room instance from the job context.

Director Notes cues

Inline cues let the avatar shift emotion or delivery mid-turn (see Director Notes for the cue list and preset styles). Your LLM writes cues as square-bracketed tags in the spoken text, for example [warm] Hello there. [curious] What brings you here?. Director Notes require a Cara 4 avatar. They work best with a voice that matches the intended performance and a neutral-expression avatar source image; an overly smiley, sad, or angry source image can limit how far cues can move the performance. In a LiveKit agent, Anam reads those cues from the agent’s TTS-aligned transcript, which you publish to the room. Wiring this up takes two things:

use_tts_aligned_transcript=True on the AgentSession, so the agent produces a transcript with per-word timings.
A JSON transcription sink scoped to the agent’s own identity, so that timed transcript (cue tags included) is published to the room on the lk.transcription stream, where the Anam engine reads it to drive per-utterance expression.

This path leaves the [cue] tags in the text sent to your TTS, so it only works with a TTS that does not read square-bracketed cues aloud — for example Cartesia sonic-3.5. Providers that speak the brackets (such as ElevenLabs) are not supported here.

from livekit.agents import AgentSession
from livekit.agents.voice.room_io import _ParticipantTranscriptionOutput
from livekit.plugins import anam, cartesia

session = AgentSession(
    # ... stt / llm / vad / turn_detection ...
    tts=cartesia.TTS(model="sonic-3.5"),  # does not vocalise "[cue]" tags
    use_tts_aligned_transcript=True,
)

avatar = anam.AvatarSession(
    persona_config=anam.PersonaConfig(
        name="Cara",
        avatarId=os.getenv("ANAM_AVATAR_ID"),
        avatarModel="cara-4",
    ),
    api_key=os.getenv("ANAM_API_KEY"),
)

# Order matters: avatar.start sets the audio sink, then attach the transcription
# sink, then session.start — both sinks stay in place.
await avatar.start(session, room=ctx.room)

# Publish the timed, cue-tagged transcript so the engine can resolve cues against it.
# `participant` MUST be the agent's own identity (it is the transcript's sender);
# leave it unset and every chunk is silently dropped.
session.output.transcription = _ParticipantTranscriptionOutput(
    room=ctx.room,
    is_delta_stream=True,
    participant=ctx.room.local_participant.identity,
    json_format=True,
)

await session.start(agent=Assistant(), room=ctx.room)

_ParticipantTranscriptionOutput is currently an internal LiveKit Agents API. Set the transcription sink after avatar.start() and before session.start().

Prompt your LLM to emit cues

The transcript sink only forwards cues that are already in the agent’s replies — nothing generates them for you. Your LLM has to write the [cue] tags inline, so add a short instruction to the agent’s system prompt:

Use Anam cue tags sparingly and naturally to mark emotional shifts in your spoken replies.
Available cue tags are: [happy], [warm], [playful], [curious], [supportive], [concerned], [sad], [surprised], [angry], [distressed], and [laughter].
Place tags inline before the words they should affect.
Only use a tag when the delivery should noticeably change. Do not explain the tags.

See Prompting an LLM to use cues for more detail.

Advanced examples

Gemini with Vision

Use Gemini Live for multimodal conversations with screen share analysis:

import os
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.agents.voice import VoiceActivityVideoSampler, room_io
from livekit.plugins import anam, google

async def entrypoint(ctx: JobContext):
    await ctx.connect()

    llm = google.realtime.RealtimeModel(
        model="gemini-2.0-flash-exp",
        api_key=os.getenv("GEMINI_API_KEY"),
        voice="Aoede",
        instructions="You are a helpful assistant that can see the user's screen.",
    )

    avatar = anam.AvatarSession(
        persona_config=anam.PersonaConfig(
            name="Maya",
            avatarId=os.getenv("ANAM_AVATAR_ID"),
            avatarModel="cara-4",
        ),
        api_key=os.getenv("ANAM_API_KEY"),
    )

    session = AgentSession(
        llm=llm,
        video_sampler=VoiceActivityVideoSampler(
            speaking_fps=0.2,
            silent_fps=0.1,
        ),
    )

    await avatar.start(session, room=ctx.room)
    await session.start(
        agent=Agent(instructions="Help the user with what you see on their screen."),
        room=ctx.room,
        room_input_options=room_io.RoomInputOptions(video_enabled=True),
    )

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Function tools

Extend your agent with custom tools:

from livekit.agents import function_tool

@function_tool
async def fill_form_field(field_name: str, value: str) -> str:
    """Fill in a form field on the user's screen.

    Args:
        field_name: The name of the field to fill
        value: The value to enter

    Returns:
        Confirmation message
    """
    await send_command_to_frontend("fill_field", {"field": field_name, "value": value})
    return "Field filled successfully"

session = AgentSession(
    llm=llm,
    tools=[fill_form_field],
)

Running your agent

Development
Production

python agent.py dev

Connects to your LiveKit server and automatically joins rooms when participants connect.

python agent.py

Deploy using Docker, Kubernetes, or your preferred container platform. See the LiveKit Agents deployment guide for details.

Troubleshooting

Agent won't connect to LiveKit

Verify LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET are correct
Check that your LiveKit server is accessible
Ensure WebSocket connections aren’t blocked by a firewall
Test connectivity at meet.livekit.io

Avatar not appearing

Verify your ANAM_API_KEY is valid
Check that ANAM_AVATAR_ID matches an existing avatar
Review agent logs for Anam connection errors
Ensure the avatar session starts before the agent session

No voice response

Check your LLM API key is valid (OpenAI, Gemini, etc.)
Verify microphone permissions in the browser
Look for API errors in the agent logs
Confirm the agent is receiving audio tracks

High latency or choppy audio

Check your network connection stability
Consider using LiveKit Cloud for optimized routing
Reduce video sampling frequency if CPU-bound
Monitor your LLM API response times

​Installation

​Environment variables

​PersonaConfig

​AvatarSession

​start()

​Director Notes cues

​Prompt your LLM to emit cues

​Advanced examples

​Gemini with Vision

​Function tools

​Running your agent

​Troubleshooting

Installation

Environment variables

PersonaConfig

AvatarSession

start()

Director Notes cues

Prompt your LLM to emit cues

Advanced examples

Gemini with Vision

Function tools

Running your agent

Troubleshooting