February 24, 2026
Python BYO LLM with Anam TTS and avatar
When you run your own LLM, you need Anam to handle only TTS and avatar, not the full pipeline. Set llm_id="CUSTOMER_CLIENT_V1" to disable Anam's LLM in the orchestration layer. You send your LLM's output via talk_stream.send(), and Anam converts it to speech and renders the avatar.
This recipe focuses on PersonaConfig with llm_id=CUSTOMER_CLIENT_V1, sending example LLM output via create_talk_stream() and talk_stream.send(), and interruption handling with TALK_STREAM_INTERRUPTED callback.
The complete code is at examples/python-byo-llm.
What you'll build
A Python script that:
- Uses PersonaConfig with
llm_id="CUSTOMER_CLIENT_V1"(disables Anam's LLM) - Connects with
connect_async()and disables session recordings - Sends your LLM's output via
create_talk_stream()andtalk_stream.send()on theTalkMessageStream - Handles interruptions with
TALK_STREAM_INTERRUPTEDcallback - Displays the avatar and plays audio
The script reads LLM output from a file, one text chunk per line. It adds a 450ms delay between chunks to simulate real-time LLM streaming.
Prerequisites
- Python 3.10+
- uv
- Anam API key from lab.anam.ai
- Avatar and voice IDs from lab.anam.ai
Disabling Anam's LLM: CUSTOMER_CLIENT_V1
To use your own LLM, you must disable Anam's built-in LLM. Set llm_id="CUSTOMER_CLIENT_V1" in PersonaConfig. This tells Anam's orchestration layer that the LLM is provided by the customer—Anam will not run its own LLM. You send your LLM's output via talk_stream.send(), which goes directly to TTS.
from anam.types import PersonaConfig
persona_config = PersonaConfig(
avatar_id="your-avatar-id",
voice_id="your-voice-id",
llm_id="CUSTOMER_CLIENT_V1", # Required: disables Anam's LLM
enable_audio_passthrough=False,
)Why this is required: Without CUSTOMER_CLIENT_V1, Anam would run its own LLM. This will create its own stream of LLM output and create additional TTS segments. This interferes with your LLM output and conversation context and will result in poor user experience.
Connecting with connect_async and disabling session recordings
Use connect_async() instead of connect() when you need to pass session options. Set enable_session_replay=False to disable session recordings.
from anam import AnamClient, AnamEvent, ClientOptions
from anam.types import SessionOptions
client = AnamClient(
api_key=api_key,
persona_config=persona_config,
options=ClientOptions(),
)
session_options = SessionOptions(enable_session_replay=False)
session = await client.connect_async(session_options=session_options)
try:
# ... use session
finally:
await session.close()Sending your LLM's output
Wait for SESSION_READY before sending chunks. Create a TalkMessageStream with create_talk_stream() and send text chunks with talk_stream.send(). The stream manages correlation IDs internally for interruption handling. Set end_of_speech=True on the final chunk:
talk_stream = session.create_talk_stream()
for i, text in enumerate(chunks):
await talk_stream.send(text, end_of_speech=(i == len(chunks) - 1))If your LLM streams chunks without a clear "last chunk" signal (e.g. consuming async iterators), call talk_stream.end() when done to signal end of speech.
Register TALK_STREAM_INTERRUPTED callback to handle interruption events. Flush any remaining text in the buffer/response and create a new TalkMessageStream when the user interrupts. A new TalkMessageStream is required to create a new correlation_id so that the new LLM output is mapped on the new turn.
@client.on(AnamEvent.TALK_STREAM_INTERRUPTED)
async def on_talk_stream_interrupted(correlation_id: str | None) -> None:
print(f"Application level talk stream interruption handling for: {correlation_id}")
global talk_stream
# Flush the LLM output buffer to avoid stale output being sent
llm_output_buffer.clear()
# Create a new talk stream for the new turn
talk_stream = session.create_talk_stream()
follow_up = "Okay, interrupted. What else can I help you with today?"
await talk_stream.send(follow_up, end_of_speech=True)For a single message, you can use session.send_talk_stream(content) as a convenience method, it creates a stream, sends, and ends in one call. However, this is discouraged for streaming LLM output due to the overhead and complexity around interrupt handling.
Project setup
git clone https://github.com/anam-org/anam-cookbook.git
cd anam-cookbook/examples/python-byo-llm
uv sync
cp .env.example .envEdit .env:
ANAM_API_KEY=your_key
ANAM_AVATAR_ID=your_avatar_id
ANAM_VOICE_ID=your_voice_idRunning the script
uv run python main.py # uses llm_output_sample.txt
uv run python main.py path/to/chunks.txt # custom file (one text chunk per line)Press q in the video window to quit, i to interrupt the avatar.
Terminology
- Avatar – Just the visual character
- TTS – Text-to-speech engine
- LLM – Language model
With CUSTOMER_CLIENT_V1, you provide the LLM. Anam provides TTS and avatar—a single pipeline from your text to lip-synced video.