Director Notes

Director Notes is in beta. Behavior, supported presets, cues, and provider support may change as we continue tuning it.

Director Notes let you guide how an avatar performs a conversation, not just what it says. Use a baseline style to set the avatar’s default presence for the session, then use inline cues to shift emotion or delivery during a specific turn. Director Notes work best for broad performance direction. For example: warm, supportive, playful, angry, or distressed. They are not exact gesture controls and do not guarantee a specific facial expression, head movement, or body pose on every word. Director Notes also work best when the selected voice matches the intended performance. For inline cues, use an expressive voice that supports similar tag-based control. In Lab, these voices are labelled expressive.

Use a neutral-expression avatar image where possible. Director Notes can add expression and performance direction, but the model cannot reliably undo an overly smiley, sad, or angry source image into the opposite expression.

Styles

A style is the avatar’s baseline direction for the session. It controls the general expression, energy, gaze, and movement style the avatar returns to between cues. Set a style when you create the session token:

const response = await fetch("https://api.anam.ai/v1/auth/session-token", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.ANAM_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    personaConfig: {
      name: "Cara",
      avatarId: "30fa96d0-26c4-4e55-94a0-517025942e18",
      avatarModel: "cara-4",
      voiceId: "6bfbe25a-979d-40f3-a92b-5394170af54b",
      llmId: "0934d97d-0c3a-4f33-91b0-5e136a0ef466",
      systemPrompt: "You are a helpful assistant.",
      directorNotes: {
        presetStyle: "warm",
        expressivity: 0.5,
      },
    },
  }),
});

expressivity is optional. If you omit it, Anam uses the recommended default for the selected style.

Director Notes require a Cara 4 avatar. Set avatarModel to cara-4 when you need predictable model behavior. Older avatar models ignore Director Notes configuration.

Preset styles

Use presetStyle when one of the supported public styles matches the role you are building. Omit presetStyle for the default neutral behavior.

Preset	Suggested use cases
`happy`	Positive assistant, upbeat host, friendly guide
`warm`	Virtual assistant, onboarding guide, brand spokesperson
`playful`	AI friend, companion avatar, casual host
`supportive`	Support agent, tutor, healthcare guide
`sad`	Sad character, low-energy roleplay, empathy testing
`angry`	Irate customer, frustrated employee, escalation roleplay character
`distressed`	Distressed patient, anxious customer, crisis roleplay character

Custom styles

Use customStylePrompt to provide your own short performance direction.

directorNotes: {
  customStylePrompt:
    "Keep steady eye contact. Stay composed and attentive. Use small nods to emphasise your speech.",
  expressivity: 0.5,
}

Custom styles are experimental. The avatar may not follow custom wording as strongly or consistently as Anam-owned presets, especially if the instruction is long, contradictory, or asks for precise gestures.

Use either presetStyle or customStylePrompt, not both.

Expressivity

expressivity is a 0 to 1 dial that controls how strongly the avatar follows the active style or cue. Higher expressivity makes the avatar follow the selected style or current cue more strongly. It also increases movement, including speech-driven articulation such as mouth movement and head motion. Start with the default by omitting expressivity, or set 0.5 when you want to be explicit. Increase it when you want a stronger performance, and lower it when you want steadier, more restrained motion. Very high values can look exaggerated or unstable.

Cues

Cues temporarily override the baseline style for the current persona speech turn. Use inline cue tags when Anam receives the persona’s speech text, including Turnkey sessions and text passed to talk() or createTalkMessageStream():

await anamClient.talk(
  "[warm] Come closer, traveler. [curious] Wait, why is your shadow moving first? [laughter] Ha, that's impossible. [concerned] No, stand behind me."
);

A cue applies from the tag until the next cue tag or the end of the current turn. When the turn ends, the avatar returns to the session’s baseline style. Supported beta cue tags:

Cue	Typical effect
`[happy]`	Brighter, more positive delivery
`[warm]`	Friendly, affectionate delivery
`[playful]`	Light, playful delivery
`[curious]`	Attentive, questioning delivery
`[supportive]`	Calm, teaching or reassuring delivery
`[concerned]`	Sympathetic delivery
`[sad]`	Lower-energy, sad delivery
`[surprised]`	More animated, surprised delivery
`[angry]`	Firm, stern delivery
`[distressed]`	Panicked or distressed delivery
`[laughter]`	Laughter in supported TTS providers; playful avatar direction

Cue support is best effort. When both the TTS provider and avatar model support a cue, it is applied to both voice and avatar direction. For voice changes to follow the same inline tags, use an expressive, tag-capable voice such as a Cartesia Sonic 3.5 voice. If a provider does not support a cue, that side ignores it. For example, a cue may affect the avatar while being ignored by ElevenLabs TTS, or it may affect Cartesia TTS while the avatar falls back to the baseline style. Inline cue tags are preserved in text output so you can see where the LLM applied them. They are not intended to be spoken aloud by the avatar.

Send cues over the data channel

Use data-channel cue messages when Anam does not receive the persona’s speech text, such as in audio passthrough sessions where your own TTS provider generates the audio. Turnkey sessions should continue to generate cues inline in the persona speech text, either through your prompt or the ADD CUES toggle in Lab. After the streaming session is connected, send a director_note_cue message over the WebRTC data channel:

anamClient.sendDataMessage(
  JSON.stringify({
    message_type: "director_note_cue",
    cue: { tag: "happy" },
    at_seconds: 0.4,
  }),
);

Use the cue tag name without brackets. For example, send "happy", not "[happy]". If you are not using the JavaScript SDK, send the same JSON payload over the session WebRTC data channel.

Timing options

Include one timing field for clear intent:

at_seconds

number

Absolute offset, in seconds, from the start of the current persona speech turn. Use this when your TTS provider gives word or audio timing data. If your TTS provider returns timing for the cue tag, use the tag’s timestamp. If it only returns timing for spoken words, use the timestamp of the word immediately after the tag. If no persona turn is active yet, Anam can queue the cue briefly for the next turn.

in_seconds

number

Relative delay, in seconds, from when Anam receives the cue. Use this for immediate or near-immediate changes during an active persona speech turn. If no turn is active, Anam rejects the cue.

Do not send both fields in the same message. If you omit both fields, Anam treats the cue as in_seconds: 0, but sending the timing field explicitly is clearer. Timing values must be finite and non-negative. Invalid messages, unknown tags, empty tags, and tags longer than 64 characters are dropped. For audio passthrough, prefer at_seconds when you can align cues to your generated audio:

function sendDirectorNoteCue(tag: string, atSeconds: number) {
  anamClient.sendDataMessage(
    JSON.stringify({
      message_type: "director_note_cue",
      cue: { tag },
      at_seconds: atSeconds,
    }),
  );
}

sendDirectorNoteCue("warm", 0.0);
sendDirectorNoteCue("surprised", 1.25);

Use in_seconds when the cue should apply relative to the current moment in an active response:

anamClient.sendDataMessage(
  JSON.stringify({
    message_type: "director_note_cue",
    cue: { tag: "concerned" },
    in_seconds: 0,
  }),
);

Add cues in Lab

In Anam Lab, the ADD CUES toggle is off by default. Turning it on just appends this hidden # STYLE CUES section to the persona system prompt; it does not edit the visible prompt. This is the current Lab prompt section:

# STYLE CUES
Stay in character, but use inline style cues naturally when your mood shifts.

You may use these cue tags: [happy], [warm], [playful], [curious], [supportive], [concerned], [sad], [surprised], [angry], [distressed].

Place cue tags directly before the phrase or sentence they should affect. Use them sparingly but often enough to show clear emotional changes during the conversation. Do not explain the tags. Do not mention that you are using cues. Unknown tags are not allowed.

Example style:
"[curious] Oh... that's unusual. But I like unusual. Tell me exactly what happened."

Occasionally use disfluencies "um"

Do laughter like [laughter], don't say haha or hehe.

When you get angry make sure to use loads of exclamation marks! or !! Keep lowercase though.

Also for suprise, use exclamation marks

Use ADD CUES when you want the LLM to choose cue tags as the conversation changes, instead of manually placing every cue in your application code.

Prompting an LLM to use cues

If your LLM writes the avatar’s responses, add a short instruction to the system prompt:

Use Anam cue tags sparingly and naturally to mark emotional shifts in your spoken replies.
Available cue tags are: [happy], [warm], [playful], [curious], [supportive], [concerned], [sad], [surprised], [angry], [distressed], and [laughter].
Place tags inline before the words they should affect.
Only use a tag when the delivery should noticeably change. Do not explain the tags.

The cue tags remain visible in the text transcript, which can help you confirm the LLM is applying them in the right places.

Limitations

Director Notes require a Cara 4 avatar.
Preset and cue behavior may change as we continue tuning the model.
Director Notes guide performance style; they do not provide deterministic gesture, gaze, or pose control.
Custom styles are experimental and may be followed less consistently than presets.
Cues only apply within the turn where they appear. Use directorNotes.presetStyle or directorNotes.customStylePrompt for persistent session behavior.
Data-channel cue messages are mainly for audio passthrough sessions. In Turnkey sessions, prefer inline cue tags in the persona speech text.
TTS cue support varies by provider and voice. Use an expressive voice with inline tag support when you want cue tags to affect the voice as well as the avatar.
Very high expressivity can make motion look exaggerated or unstable. Start with the default and adjust gradually.

​Styles

​Preset styles

​Custom styles

​Expressivity

​Cues

​Send cues over the data channel

​Timing options

​Add cues in Lab

​Prompting an LLM to use cues

​Limitations

Styles

Preset styles

Custom styles

Expressivity

Cues

Send cues over the data channel

Timing options

Add cues in Lab

Prompting an LLM to use cues

Limitations