Home / Audio / Music and Voice Beds

Fish Audio - complete in-text control reference (the "[ ]" system)

Updated Jun 25, 2026 · Affirmology_FishVoice_Markers_Reference_v1.md

Summary. Authoritative, sourced. Fish does NOT shape voice with sliders the way ElevenLabs does. It shapes voice with markers and controls embedded in the input text, plus a small set of API params. This is the full public catalog (sources at the bottom).

Fish Audio - complete in-text control reference (the "[ ]" system)

Authoritative, sourced. Fish does NOT shape voice with sliders the way ElevenLabs does. It shapes voice with markers and controls embedded in the input text, plus a small set of API params. This is the full public catalog (sources at the bottom).

1. Two marker dialects

S2 (current, what we use) - square brackets [marker]. Crucially, S2 is open-domain: it is NOT limited to the fixed list. You can write free-form descriptions and modifiers and it interprets them:
[whispers sweetly], [laughing nervously], [slightly sad], [very excited], [extremely gentle]
S1 (legacy) - parentheses (marker), fixed set only, no custom tags.

2. Placement & combining rules

Emotion markers work best / must go at the beginning of a sentence.
Tone markers and sound effects can go anywhere in the text.
Combine at most ~3 emotions per sentence. S2 can layer, e.g. [soft tone][calm][grateful].
Don't mix conflicting emotions in one sentence.
All 13 supported languages can use emotion markers.

3. Emotion markers (S2 `[ ]` / S1 `( )`)

Basic (24): happy, sad, angry, excited, calm, nervous, confident, surprised, satisfied, delighted, scared, worried, upset, frustrated, depressed, empathetic, embarrassed, disgusted, moved, proud, relaxed, grateful, curious, sarcastic Advanced (25): disdainful, unhappy, anxious, hysterical, indifferent, uncertain, doubtful, confused, disappointed, regretful, guilty, ashamed, jealous, envious, hopeful, optimistic, pessimistic, nostalgic, lonely, bored, contemptuous, sympathetic, compassionate, determined, resigned (S2 also accepts modifiers on any of these: [slightly nostalgic], [very calm].)

4. Tone markers (5)

[in a hurry tone] [shouting] [screaming] [whispering] [soft tone]

5. Sound / audio effects (10)

[laughing] [chuckling] [sobbing] [crying loudly] [sighing] [groaning] [panting] [gasping] [yawning] [snoring] Plus crowd effects: [audience laughing] [background laughter] [crowd laughing]

6. Pauses & breathing (the "delay tags")

[break] - short pause. Empirically confirmed on s2: one [break] adds a real silence; two [break][break] produced a ~0.94s gap in our test (not spoken aloud).
[long-break] - extended pause.
[breath] - audible inhale. ([inhale] / [exhale] also referenced.)
These paralanguage effects are documented as the V1.6 Control Model set and marked experimental; brackets work on s2 in practice.

7. Pause words (natural rhythm)

Inserting filler words like "um", "uh" (or natural laughter written as "Ha,ha,ha") controls rhythm/realism without any tag. Use sparingly for a meditation voice.

8. Phoneme / pronunciation control ← important for names

Force exact pronunciation with: <|phoneme_start|>PHONEMES<|phoneme_end|> - English: CMU Arpabet (per word) - Chinese: tone-number pinyin - Japanese: OpenJTalk romaji with pitch-accent digits This is how we make sure a person's name (e.g. an unusual spelling) is voiced correctly in their Soul Song.

9. API params (not in-text)

prosody: { "speed": <float>, "volume": <int> }. We render at speed 0.80 - 0.88. Caveat (measured): the speed knob is coarse - 0.88 vs 0.93 produced near-identical length. For real pacing change, move in bigger steps or add [break]/[long-break] between sentences (which we now do). Model is chosen by the request model header: s2 (we use this) or s1.

Meditation / Soul Song cheat-sheet

Lead sentences with [calm], [relaxed], [grateful], [hopeful], [compassionate], [nostalgic].
Texture with [soft tone] / [whispering] (sparingly).
Space it out with [break] between sentences and [long-break] at section breaks.
Pin tricky names with <|phoneme_start|>...<|phoneme_end|>.

Sources

Emotion Control - Fish Audio docs
Emotion & Expression Control (best practices)
Fine-grained Control (phonemes, pause words, paralanguage)
Text to Speech Guide
Models Overview / pricing
Machine-readable index of all docs: https://docs.fish.audio/llms.txt

Verified in-engine 2026-06-25: Fish s2 honors [break] as inserted silence; all 1-min A/B renders passed audio_qc.py.

Fish Audio - complete in-text control reference (the "[ ]" system)

Fish Audio - complete in-text control reference (the "[ ]" system)

1. Two marker dialects

2. Placement & combining rules

3. Emotion markers (S2 [ ] / S1 ( ))

4. Tone markers (5)

5. Sound / audio effects (10)

6. Pauses & breathing (the "delay tags")

7. Pause words (natural rhythm)

8. Phoneme / pronunciation control ← important for names

9. API params (not in-text)

Meditation / Soul Song cheat-sheet

Sources

Related documents

3. Emotion markers (S2 `[ ]` / S1 `( )`)