Director Notes is in beta. Behavior, supported presets, cues, and provider support may change as we continue tuning it.
Styles
A style is the avatar’s baseline direction for the session. It controls the general expression, energy, gaze, and movement style the avatar returns to between cues. Set a style when you create the session token:expressivity is optional. If you omit it, Anam uses the recommended default for the selected style.
Director Notes require a Cara 4 avatar. Set
avatarModel to cara-4 when you need predictable model behavior. Older avatar models ignore Director Notes configuration.Preset styles
UsepresetStyle when one of the supported public styles matches the role you are building. Omit presetStyle for the default neutral behavior.
| Preset | Suggested use cases |
|---|---|
happy | Positive assistant, upbeat host, friendly guide |
warm | Virtual assistant, onboarding guide, brand spokesperson |
playful | AI friend, companion avatar, casual host |
supportive | Support agent, tutor, healthcare guide |
sad | Sad character, low-energy roleplay, empathy testing |
angry | Irate customer, frustrated employee, escalation roleplay character |
distressed | Distressed patient, anxious customer, crisis roleplay character |
Custom styles
UsecustomStylePrompt to provide your own short performance direction.
presetStyle or customStylePrompt, not both.
Expressivity
expressivity is a 0 to 1 dial that controls how strongly the avatar follows the active style or cue.
Higher expressivity makes the avatar follow the selected style or current cue more strongly. It also increases movement, including speech-driven articulation such as mouth movement and head motion.
Start with the default by omitting expressivity, or set 0.5 when you want to be explicit. Increase it when you want a stronger performance, and lower it when you want steadier, more restrained motion. Very high values can look exaggerated or unstable.
Cues
Cues temporarily override the baseline style for the current persona speech turn. Use inline cue tags when Anam receives the persona’s speech text, including Turnkey sessions and text passed totalk() or createTalkMessageStream():
| Cue | Typical effect |
|---|---|
[happy] | Brighter, more positive delivery |
[warm] | Friendly, affectionate delivery |
[playful] | Light, playful delivery |
[curious] | Attentive, questioning delivery |
[supportive] | Calm, teaching or reassuring delivery |
[concerned] | Sympathetic delivery |
[sad] | Lower-energy, sad delivery |
[surprised] | More animated, surprised delivery |
[angry] | Firm, stern delivery |
[distressed] | Panicked or distressed delivery |
[laughter] | Laughter in supported TTS providers; playful avatar direction |
Send cues over the data channel
Use data-channel cue messages when Anam does not receive the persona’s speech text, such as in audio passthrough sessions where your own TTS provider generates the audio. Turnkey sessions should continue to generate cues inline in the persona speech text, either through your prompt or the ADD CUES toggle in Lab. After the streaming session is connected, send adirector_note_cue message over the WebRTC data channel:
"happy", not "[happy]".
If you are not using the JavaScript SDK, send the same JSON payload over the session WebRTC data channel.
Timing options
Include one timing field for clear intent:Absolute offset, in seconds, from the start of the current persona speech turn. Use this when your TTS provider gives word or audio timing data. If your TTS provider returns timing for the cue tag, use the tag’s timestamp. If it only returns timing for spoken words, use the timestamp of the word immediately after the tag. If no persona turn is active yet, Anam can queue the cue briefly for the next turn.
Relative delay, in seconds, from when Anam receives the cue. Use this for immediate or near-immediate changes during an active persona speech turn. If no turn is active, Anam rejects the cue.
in_seconds: 0, but sending the timing field explicitly is clearer.
Timing values must be finite and non-negative. Invalid messages, unknown tags, empty tags, and tags longer than 64 characters are dropped.
For audio passthrough, prefer at_seconds when you can align cues to your generated audio:
in_seconds when the cue should apply relative to the current moment in an active response:
Add cues in Lab
In Anam Lab, the ADD CUES toggle is off by default. Turning it on just appends this hidden# STYLE CUES section to the persona system prompt; it does not edit the visible prompt.
This is the current Lab prompt section:
Prompting an LLM to use cues
If your LLM writes the avatar’s responses, add a short instruction to the system prompt:Limitations
- Director Notes require a Cara 4 avatar.
- Preset and cue behavior may change as we continue tuning the model.
- Director Notes guide performance style; they do not provide deterministic gesture, gaze, or pose control.
- Custom styles are experimental and may be followed less consistently than presets.
- Cues only apply within the turn where they appear. Use
directorNotes.presetStyleordirectorNotes.customStylePromptfor persistent session behavior. - Data-channel cue messages are mainly for audio passthrough sessions. In Turnkey sessions, prefer inline cue tags in the persona speech text.
- TTS cue support varies by provider and voice. Use an expressive voice with inline tag support when you want cue tags to affect the voice as well as the avatar.
- Very high expressivity can make motion look exaggerated or unstable. Start with the default and adjust gradually.

