Available LLMs

Every persona needs a language model to power its conversations. Anam provides several built-in LLMs you can use without any additional setup — just reference the model’s ID in your persona configuration.

Built-in models

LLM ID	Model	Best for
`a7cf662c-2ace-4de1-a21e-ef0fbf144bb7`	GPT OSS 120B	Recommended for most projects
`ANAM_LLAMA_v3_3_70B_V1`	Llama 3.3 70B	Great conversationalist, very fast. Not recommended for tool calls or knowledge base
`b4f89001-9638-4879-a9c3-02cc9f9f2004`	GPT 4.1	Slow but capable all-rounder

Using a built-in LLM

Set the llmId field in your persona configuration to the ID of the model you want to use:

const personaConfig = {
  name: "Cara",
  avatarId: "30fa96d0-26c4-4e55-94a0-517025942e18",
  avatarModel: "cara-4",
  voiceId: "6bfbe25a-979d-40f3-a92b-5394170af54b",
  llmId: "a7cf662c-2ace-4de1-a21e-ef0fbf144bb7",
  systemPrompt: "You are Cara, a helpful customer service representative.",
};

You can also select a model from the dropdown in Anam Lab when creating or editing a persona.

Choosing a model

For most use cases, GPT OSS 120B is a good starting point — it’s fast, has strong reasoning, and handles tools reliably. Choose Llama 3.3 70B for the most natural conversational feel when you don’t need tool calling or a knowledge base. Choose GPT 4.1 when you need a capable all-rounder and can tolerate higher latency.

Older llmIds previously listed here — including GPT-4.1 Mini (0934d97d-0c3a-4f33-91b0-5e136a0ef466), Gemini 3 Flash Preview (27cbd128-f1e6-4b67-8ab3-9123659be08c), Gemini 2.5 Flash (9d8900ee-257d-4401-8817-ba9c835e9d36), and Kimi k2 (88190a76-3e87-4935-ab39-f4f73038815a) — continue to work for existing configurations, but are no longer recommended for new projects.

Using LLMs with reasoning

LLMs that have reasoning enabled will produce separate reasoning messages in addition to the spoken text messages made by the persona. These messages contain the reasoning used by the LLM when forming the response.

Currently only OpenAI spec LLMs support reasoning messages (e.g. OpenAI, Azure OpenAI and Groq OpenAI). For best performance we suggest using the reasoning models provided by Groq.

How Reasoning Messages Work

User makes a request

User: “Show me the pricing page”

LLM produces reasoning response prior to main response

{
  ...
  "reasoning": "The user has requested to see the pricing page, I need to call the pricing page tool and respond to the user"
  ...
}

SDK emits event

The Anam SDK emits a REASONING_HISTORY_UPDATED event that your application can handle.

Your app handles the event

Each REASONING_HISTORY_UPDATED event contains the full history of reasoning messages. Alternatively, you can listen for REASONING_STREAM_EVENT_RECEIVED which streams updates in chunks, but you will need to handle aggregating the messages yourself.

import { AnamEvent, ReasoningMessage } from '@anam-ai/js-sdk';

// Option 1: Full history on each update
client.addListener(
  AnamEvent.REASONING_HISTORY_UPDATED,
  (messages: ReasoningMessage[]) => {
    updateReasoningMessageHistory(messages);
  }
);

import { AnamEvent, ReasoningStreamEvent } from '@anam-ai/js-sdk';

// Option 2: Streaming updates (requires manual aggregation)
client.addListener(
  AnamEvent.REASONING_STREAM_EVENT_RECEIVED,
  (event: ReasoningStreamEvent) => {
    setReasoningHistory((previousMessages) => {
      const lastMessage = previousMessages[previousMessages.length - 1];

      // Handle streamed thoughts - append to existing message
      if (lastMessage && lastMessage.id === event.id) {
        const updatedMessages = [...previousMessages];
        updatedMessages[updatedMessages.length - 1] = {
          ...lastMessage,
          content: lastMessage.content + ' ' + event.content,
        };
        return updatedMessages;
      }

      // Handle new messages
      return [
        ...previousMessages,
        {
          content: event.content,
          id: event.id,
        },
      ];
    });
  }
);

Greeting behavior

When using a built-in LLM, the persona greets the user with an opening message when the session starts. By default the greeting is generated from the system prompt, but you can set the exact line yourself or skip the greeting altogether.

Set a custom first message

Use initialMessage to set the exact line the persona speaks at the start of the session, instead of letting the LLM generate one from the system prompt:

const personaConfig = {
  name: "Cara",
  avatarId: "30fa96d0-26c4-4e55-94a0-517025942e18",
  avatarModel: "cara-4",
  voiceId: "6bfbe25a-979d-40f3-a92b-5394170af54b",
  llmId: "0934d97d-0c3a-4f33-91b0-5e136a0ef466",
  systemPrompt: "You are Cara, a helpful customer service representative.",
  initialMessage: "Hi, I'm Cara from customer support. What can I help you with today?",
};

If initialMessage is omitted, empty, or whitespace-only, the persona falls back to a greeting generated from the system prompt. You can also set the first message from the Prompt tab in Anam Lab — leave the field blank to keep the auto-generated greeting.

Skip the greeting

To have the persona wait for the user to speak first, set skipGreeting to true:

const personaConfig = {
  name: "Cara",
  avatarId: "30fa96d0-26c4-4e55-94a0-517025942e18",
  avatarModel: "cara-4",
  voiceId: "6bfbe25a-979d-40f3-a92b-5394170af54b",
  llmId: "a7cf662c-2ace-4de1-a21e-ef0fbf144bb7",
  systemPrompt: "You are Cara, a helpful customer service representative.",
  skipGreeting: true,
};

This is useful when you want the user to initiate the conversation, or when the persona is responding to an event rather than starting a dialogue. When skipGreeting is true, any initialMessage value is ignored.

Protect the greeting from interruptions

By default, the user can interrupt the greeting as soon as the persona starts speaking. To make the greeting play through to completion, set uninterruptibleGreeting to true:

const personaConfig = {
  name: "Cara",
  avatarId: "30fa96d0-26c4-4e55-94a0-517025942e18",
  avatarModel: "cara-4",
  voiceId: "6bfbe25a-979d-40f3-a92b-5394170af54b",
  llmId: "0934d97d-0c3a-4f33-91b0-5e136a0ef466",
  systemPrompt: "You are Cara, a helpful customer service representative.",
  uninterruptibleGreeting: true,
};

Use this when the greeting carries information the user needs to hear in full, such as a regulatory disclosure, a privacy notice, or instructions that frame the rest of the conversation. After the greeting finishes, normal interruption behavior resumes. uninterruptibleGreeting has no effect when skipGreeting is true.

Bring your own LLM

If the built-in models don’t fit your needs, you can connect your own:

Server-side custom LLMs — Register your model with Anam and we call it from our servers, keeping latency low.
Client-side custom LLMs — Handle LLM calls yourself in your client code using CUSTOMER_CLIENT_V1 as the LLM ID.
LiveKit — Use Anam as a face layer in your existing LiveKit agent pipeline with any LLM.

​Built-in models

​Using a built-in LLM

​Choosing a model

​Using LLMs with reasoning

​How Reasoning Messages Work

​Greeting behavior

​Set a custom first message

​Skip the greeting

​Protect the greeting from interruptions

​Bring your own LLM

Built-in models

Using a built-in LLM

Choosing a model

Using LLMs with reasoning

How Reasoning Messages Work

Greeting behavior

Set a custom first message

Skip the greeting

Protect the greeting from interruptions

Bring your own LLM