Home / Audio / Production and Craft

Field Briefing Audio (C15) - cost memo (v1, 2026-06-20)

Updated Jun 20, 2026 · Affirmology_FieldBriefingAudio_CostMemo_v1.md

Summary. The question: is turning a Hermes chat reading into a voiced audio a feature just for Jeff + Sol, viable for any membership tier, or for the dev/testers too? Answer: it can be all three. The cost is small. The only real lever is which text-to-speech engine you

Field Briefing Audio (C15) - cost memo (v1, 2026-06-20)

The question: is turning a Hermes chat reading into a voiced audio a feature just for Jeff + Sol, viable for any membership tier, or for the dev/testers too? Answer: it can be all three. The cost is small. The only real lever is which text-to-speech engine you use and whether you cap volume. Voice rendering (TTS) is the whole cost; the reading text already exists from the chat, the sound bed is a pre-made file, and a small reformat pass is pennies.

The unit

A "field briefing" = a chat reading turned into spoken form, voice plus a light bed, no full hypnotic structure. Typical length ~1,000 words (~6 minutes of audio) ≈ ~6,000 characters. TTS APIs bill per character (or per UTF-8 byte, same thing for English).

Current per-briefing TTS cost (≈6,000 characters, verified June 2026)

Engine Rate Cost per 6k-char briefing Notes
Fish Audio (s2-pro) $15 / 1M UTF-8 bytes ~$0.09 Cheapest, commercial use OK, API-first
OpenAI tts-1 $15 / 1M chars ~$0.09 Same price, simplest integration
OpenAI tts-1-hd $30 / 1M chars ~$0.18 Higher fidelity
ElevenLabs ~$0.12 - $0.30 / 1k chars (tier-dependent) ~$0.70 - $1.80 Premium voice, your Sacred Audio engine
Higgsfield Audio credit-metered (~$5 / 100 top-up credits) opaque, ~$0.10 - $0.50+ Bundled in a video suite, not API-first

Add a Haiku reformat pass (reading -> spoken script): ~$0.01. Add the bed mix: ~$0 (local, pre-rendered). So all-in per briefing: - BUDGET path (Fish Audio or OpenAI tts-1): ~$0.10 - PREMIUM path (ElevenLabs, your brand voice): ~$1

The three scenarios

  1. JEFF + SOL (internal). Even 30 briefings a month on the premium voice is ~$30/mo; on the budget voice ~$3/mo. Noise. Just turn it on with the ElevenLabs voice. No reason to economize internally.
  2. MEMBERSHIP TIERS (NOT everyone). Jeff's call 2026-06-20: do NOT offer it to all members, some users would rack up large bills. Correct, and the fix is a HARD cap enforced server-side, not trust. The API key is ours, so the bill is ours, so our system must enforce the limit. Three layers: (a) GATE the feature to chosen tiers only (keep it off the free/entry tier); (b) HARD monthly quota per member, enforced server-side, render refused once hit (not a soft warning); (c) per-member render counter. This makes maximum exposure a fixed, knowable number: members x cap x per-briefing cost. On the budget voice a 20/month cap is ~$2/member ceiling. Recommendation: launch for Jeff + Sol + the top tier only, hard-capped, budget voice; full Sacred Audio (ElevenLabs) reserved even higher.
  3. DEV TEAM / TESTERS. Pure internal volume. Hundreds of test renders on the budget voice is single-digit dollars. Use the budget engine for automated/CI renders, premium only for final QA listens.

Delay

Not a real problem if the UX is right. TTS renders a ~6-minute audio in roughly 10 - 40 seconds (faster than realtime). Don't make it a live in-chat wait. Make it a BACKGROUND render that drops into a library/inbox tab with a "ready" ping. That removes the latency concern entirely.

Engine recommendation

Bottom line

Build it for all three. Use the budget TTS engine as the default field-briefing voice (~$0.10 each) with a soft per-month cap on member tiers, reserve ElevenLabs for full Sacred Audio, and deliver via a background render into a library tab so there is no wait. Internally for you and Sol it is effectively free; as a member feature the economics clearly work; for testers it is negligible.

Sources