Home / Audio / Production and Craft
Field Briefing Audio (C15) - cost memo (v1, 2026-06-20)
Updated Jun 20, 2026 · Affirmology_FieldBriefingAudio_CostMemo_v1.md
Summary. The question: is turning a Hermes chat reading into a voiced audio a feature just for Jeff + Sol, viable for any membership tier, or for the dev/testers too? Answer: it can be all three. The cost is small. The only real lever is which text-to-speech engine you
Field Briefing Audio (C15) - cost memo (v1, 2026-06-20)
The question: is turning a Hermes chat reading into a voiced audio a feature just for Jeff + Sol, viable for any membership tier, or for the dev/testers too? Answer: it can be all three. The cost is small. The only real lever is which text-to-speech engine you use and whether you cap volume. Voice rendering (TTS) is the whole cost; the reading text already exists from the chat, the sound bed is a pre-made file, and a small reformat pass is pennies.
The unit
A "field briefing" = a chat reading turned into spoken form, voice plus a light bed, no full hypnotic structure. Typical length ~1,000 words (~6 minutes of audio) ≈ ~6,000 characters. TTS APIs bill per character (or per UTF-8 byte, same thing for English).
Current per-briefing TTS cost (≈6,000 characters, verified June 2026)
| Engine |
Rate |
Cost per 6k-char briefing |
Notes |
| Fish Audio (s2-pro) |
$15 / 1M UTF-8 bytes |
~$0.09 |
Cheapest, commercial use OK, API-first |
| OpenAI tts-1 |
$15 / 1M chars |
~$0.09 |
Same price, simplest integration |
| OpenAI tts-1-hd |
$30 / 1M chars |
~$0.18 |
Higher fidelity |
| ElevenLabs |
~$0.12 - $0.30 / 1k chars (tier-dependent) |
~$0.70 - $1.80 |
Premium voice, your Sacred Audio engine |
| Higgsfield Audio |
credit-metered (~$5 / 100 top-up credits) |
opaque, ~$0.10 - $0.50+ |
Bundled in a video suite, not API-first |
Add a Haiku reformat pass (reading -> spoken script): ~$0.01. Add the bed mix: ~$0 (local, pre-rendered). So all-in per briefing:
- BUDGET path (Fish Audio or OpenAI tts-1): ~$0.10
- PREMIUM path (ElevenLabs, your brand voice): ~$1
The three scenarios
- JEFF + SOL (internal). Even 30 briefings a month on the premium voice is ~$30/mo; on the budget voice ~$3/mo. Noise. Just turn it on with the ElevenLabs voice. No reason to economize internally.
- MEMBERSHIP TIERS (NOT everyone). Jeff's call 2026-06-20: do NOT offer it to all members, some users would rack up large bills. Correct, and the fix is a HARD cap enforced server-side, not trust. The API key is ours, so the bill is ours, so our system must enforce the limit. Three layers: (a) GATE the feature to chosen tiers only (keep it off the free/entry tier); (b) HARD monthly quota per member, enforced server-side, render refused once hit (not a soft warning); (c) per-member render counter. This makes maximum exposure a fixed, knowable number: members x cap x per-briefing cost. On the budget voice a 20/month cap is ~$2/member ceiling. Recommendation: launch for Jeff + Sol + the top tier only, hard-capped, budget voice; full Sacred Audio (ElevenLabs) reserved even higher.
- DEV TEAM / TESTERS. Pure internal volume. Hundreds of test renders on the budget voice is single-digit dollars. Use the budget engine for automated/CI renders, premium only for final QA listens.
Delay
Not a real problem if the UX is right. TTS renders a ~6-minute audio in roughly 10 - 40 seconds (faster than realtime). Don't make it a live in-chat wait. Make it a BACKGROUND render that drops into a library/inbox tab with a "ready" ping. That removes the latency concern entirely.
Engine recommendation
- ENGINE PATH (updated 2026-06-20: Jeff cancelled OpenAI, do not rely on it). START on ElevenLabs (already wired, no new account), hard-capped per user, to test usability now. THEN move the default read-aloud voice to a LOCAL open TTS (Kokoro / Piper, on the Mac mini) for ~$0 marginal cost on unlimited internal + capped-user use. Fish Audio (~$15/1M chars, needs an account + a little money) is the cheap hosted fallback. ElevenLabs stays the premium Sacred voice regardless. The casual "read it to me" mode does not need Sacred quality, so a local/cheap voice is fine.
- KEEP ElevenLabs for full Sacred Audio (the locked demo voice and premium tiers). Do not change the demo.
- HIGGSFIELD: it CAN do TTS. "Higgsfield Audio" is their text-to-speech with 21 presets plus voice cloning and video translation/lip-sync. That is the "protocol" it used to voice your script video: its own in-platform TTS, metered in credits. But it is built for video production, credit-priced and not API-first, so it is the wrong engine for a high-volume, programmatic, in-app briefing feature. Use Higgsfield where you are already in it making video; use a direct TTS API for C15.
Bottom line
Build it for all three. Use the budget TTS engine as the default field-briefing voice (~$0.10 each) with a soft per-month cap on member tiers, reserve ElevenLabs for full Sacred Audio, and deliver via a background render into a library tab so there is no wait. Internally for you and Sol it is effectively free; as a member feature the economics clearly work; for testers it is negligible.
Sources
- ElevenLabs pricing: https://elevenlabs.io/pricing and https://texttolab.com/blog/elevenlabs-pricing
- Fish Audio pricing/docs: https://docs.fish.audio/developer-guide/models-pricing/pricing-and-rate-limits and https://smallest.ai/blog/fish-audio-pricing-plans-api-billing-commercial-use-in-2026
- OpenAI TTS pricing: https://texttolab.com/blog/openai-tts-pricing and https://costgoat.com/pricing/openai-tts
- Higgsfield Audio + pricing: https://higgsfield.ai/blog/higgsfield-audio-ai-voice-tools and https://www.imagine.art/blogs/higgsfield-ai-pricing