> ## Documentation Index
> Fetch the complete documentation index at: https://anam.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# LiveKit Configuration

> Configuration options, advanced examples, and API reference for the Anam LiveKit plugin

## Installation

```bash theme={"system"}
pip install livekit-plugins-anam
```

## Environment variables

| Service           | Where to get it                                      |
| ----------------- | ---------------------------------------------------- |
| **Anam**          | [lab.anam.ai](https://lab.anam.ai)                   |
| **LiveKit**       | [LiveKit Cloud](https://livekit.io) or self-hosted   |
| **LLM providers** | DeepGram, ElevenLabs, OpenAI, Google AI Studio, etc. |

```bash .env theme={"system"}
ANAM_API_KEY=your_anam_api_key
ANAM_AVATAR_ID=your_avatar_id

LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret

OPENAI_API_KEY=your_openai_api_key
# or
GEMINI_API_KEY=your_gemini_api_key
```

## PersonaConfig

Configure the avatar identity and model:

```python theme={"system"}
persona_config = anam.PersonaConfig(
    name="Maya",             # Display name for the avatar
    avatarId="uuid-here",    # Avatar appearance ID
    avatarModel="cara-4",    # Avatar model version
)
```

<ParamField body="name" type="string" required>
  Display name for the avatar. Used in logs and debugging.
</ParamField>

<ParamField body="avatarId" type="string" required>
  UUID of the avatar to use. Get this from the [Avatar Gallery](/resources/avatar-gallery) or [Anam Lab](https://lab.anam.ai/avatars).
</ParamField>

<ParamField body="avatarModel" type="string">
  Avatar model version to use when rendering the avatar. Use `cara-4` for new production avatars.
</ParamField>

## AvatarSession

```python theme={"system"}
avatar = anam.AvatarSession(
    persona_config=anam.PersonaConfig(...),
    api_key="your_api_key",
    api_url="https://api.anam.ai",  # Optional
)
```

<ParamField body="persona_config" type="PersonaConfig" required>
  Configuration for the avatar's identity and appearance.
</ParamField>

<ParamField body="api_key" type="string" required>
  Your Anam API key.
</ParamField>

<ParamField body="api_url" default="https://api.anam.ai" type="string">
  Anam API endpoint. Override for staging or self-hosted deployments.
</ParamField>

### start()

Starts the avatar session and connects it to the LiveKit room.

```python theme={"system"}
await avatar.start(session, room=ctx.room)
```

<ParamField body="session" type="AgentSession" required>
  The LiveKit agent session to connect the avatar to.
</ParamField>

<ParamField body="room" type="rtc.Room" required>
  The LiveKit room instance from the job context.
</ParamField>

## Director Notes cues

Inline **cues** let the avatar shift emotion or delivery mid-turn (see [Director Notes](/personas/director-notes) for the cue list and preset styles). Your LLM writes cues as square-bracketed tags in the spoken text, for example `[warm] Hello there. [curious] What brings you here?`.

Director Notes require a Cara 4 avatar. They work best with a voice that matches the intended performance and a neutral-expression avatar source image; an overly smiley, sad, or angry source image can limit how far cues can move the performance.

In a LiveKit agent, Anam reads those cues from the agent's **TTS-aligned transcript**, which you publish to the room. Wiring this up takes two things:

1. **`use_tts_aligned_transcript=True`** on the `AgentSession`, so the agent produces a transcript with per-word timings.
2. A **JSON transcription sink** scoped to the agent's own identity, so that timed transcript (cue tags included) is published to the room on the `lk.transcription` stream, where the Anam engine reads it to drive per-utterance expression.

<Warning>
  This path leaves the `[cue]` tags in the text sent to your TTS, so it only works with a TTS that does **not** read square-bracketed cues aloud — for example Cartesia `sonic-3.5`. Providers that speak the brackets (such as ElevenLabs) are not supported here.
</Warning>

```python theme={"system"}
from livekit.agents import AgentSession
from livekit.agents.voice.room_io import _ParticipantTranscriptionOutput
from livekit.plugins import anam, cartesia

session = AgentSession(
    # ... stt / llm / vad / turn_detection ...
    tts=cartesia.TTS(model="sonic-3.5"),  # does not vocalise "[cue]" tags
    use_tts_aligned_transcript=True,
)

avatar = anam.AvatarSession(
    persona_config=anam.PersonaConfig(
        name="Cara",
        avatarId=os.getenv("ANAM_AVATAR_ID"),
        avatarModel="cara-4",
    ),
    api_key=os.getenv("ANAM_API_KEY"),
)

# Order matters: avatar.start sets the audio sink, then attach the transcription
# sink, then session.start — both sinks stay in place.
await avatar.start(session, room=ctx.room)

# Publish the timed, cue-tagged transcript so the engine can resolve cues against it.
# `participant` MUST be the agent's own identity (it is the transcript's sender);
# leave it unset and every chunk is silently dropped.
session.output.transcription = _ParticipantTranscriptionOutput(
    room=ctx.room,
    is_delta_stream=True,
    participant=ctx.room.local_participant.identity,
    json_format=True,
)

await session.start(agent=Assistant(), room=ctx.room)
```

<Note>
  `_ParticipantTranscriptionOutput` is currently an internal LiveKit Agents API. Set the transcription sink **after** `avatar.start()` and **before** `session.start()`.
</Note>

### Prompt your LLM to emit cues

The transcript sink only forwards cues that are already in the agent's replies — nothing generates them for you. Your LLM has to write the `[cue]` tags inline, so add a short instruction to the agent's system prompt:

```text theme={"system"}
Use Anam cue tags sparingly and naturally to mark emotional shifts in your spoken replies.
Available cue tags are: [happy], [warm], [playful], [curious], [supportive], [concerned], [sad], [surprised], [angry], [distressed], and [laughter].
Place tags inline before the words they should affect.
Only use a tag when the delivery should noticeably change. Do not explain the tags.
```

See [Prompting an LLM to use cues](/personas/director-notes#prompting-an-llm-to-use-cues) for more detail.

## Advanced examples

### Gemini with Vision

Use Gemini Live for multimodal conversations with screen share analysis:

```python theme={"system"}
import os
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.agents.voice import VoiceActivityVideoSampler, room_io
from livekit.plugins import anam, google

async def entrypoint(ctx: JobContext):
    await ctx.connect()

    llm = google.realtime.RealtimeModel(
        model="gemini-2.0-flash-exp",
        api_key=os.getenv("GEMINI_API_KEY"),
        voice="Aoede",
        instructions="You are a helpful assistant that can see the user's screen.",
    )

    avatar = anam.AvatarSession(
        persona_config=anam.PersonaConfig(
            name="Maya",
            avatarId=os.getenv("ANAM_AVATAR_ID"),
            avatarModel="cara-4",
        ),
        api_key=os.getenv("ANAM_API_KEY"),
    )

    session = AgentSession(
        llm=llm,
        video_sampler=VoiceActivityVideoSampler(
            speaking_fps=0.2,
            silent_fps=0.1,
        ),
    )

    await avatar.start(session, room=ctx.room)
    await session.start(
        agent=Agent(instructions="Help the user with what you see on their screen."),
        room=ctx.room,
        room_input_options=room_io.RoomInputOptions(video_enabled=True),
    )

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
```

### Function tools

Extend your agent with custom tools:

```python theme={"system"}
from livekit.agents import function_tool

@function_tool
async def fill_form_field(field_name: str, value: str) -> str:
    """Fill in a form field on the user's screen.

    Args:
        field_name: The name of the field to fill
        value: The value to enter

    Returns:
        Confirmation message
    """
    await send_command_to_frontend("fill_field", {"field": field_name, "value": value})
    return "Field filled successfully"

session = AgentSession(
    llm=llm,
    tools=[fill_form_field],
)
```

## Running your agent

<Tabs>
  <Tab title="Development">
    ```bash theme={"system"}
    python agent.py dev
    ```

    Connects to your LiveKit server and automatically joins rooms when participants connect.
  </Tab>

  <Tab title="Production">
    ```bash theme={"system"}
    python agent.py
    ```

    Deploy using Docker, Kubernetes, or your preferred container platform. See the [LiveKit Agents deployment guide](https://docs.livekit.io/agents/deployment) for details.
  </Tab>
</Tabs>

## Troubleshooting

<AccordionGroup>
  <Accordion title="Agent won't connect to LiveKit">
    * Verify `LIVEKIT_URL`, `LIVEKIT_API_KEY`, and `LIVEKIT_API_SECRET` are correct
    * Check that your LiveKit server is accessible
    * Ensure WebSocket connections aren't blocked by a firewall
    * Test connectivity at [meet.livekit.io](https://meet.livekit.io)
  </Accordion>

  <Accordion title="Avatar not appearing">
    * Verify your `ANAM_API_KEY` is valid
    * Check that `ANAM_AVATAR_ID` matches an existing avatar
    * Review agent logs for Anam connection errors
    * Ensure the avatar session starts before the agent session
  </Accordion>

  <Accordion title="No voice response">
    * Check your LLM API key is valid (OpenAI, Gemini, etc.)
    * Verify microphone permissions in the browser
    * Look for API errors in the agent logs
    * Confirm the agent is receiving audio tracks
  </Accordion>

  <Accordion title="High latency or choppy audio">
    * Check your network connection stability
    * Consider using LiveKit Cloud for optimized routing
    * Reduce video sampling frequency if CPU-bound
    * Monitor your LLM API response times
  </Accordion>
</AccordionGroup>