Home / Audio / Production and Craft

Field Briefing Audio (C15) - cost memo (v1, 2026-06-20)

Updated Jun 20, 2026 · Affirmology_FieldBriefingAudio_CostMemo_v1.md

Summary. The question: is turning a Hermes chat reading into a voiced audio a feature just for Jeff + Sol, viable for any membership tier, or for the dev/testers too? Answer: it can be all three. The cost is small. The only real lever is which text-to-speech engine you

Field Briefing Audio (C15) - cost memo (v1, 2026-06-20)

The question: is turning a Hermes chat reading into a voiced audio a feature just for Jeff + Sol, viable for any membership tier, or for the dev/testers too? Answer: it can be all three. The cost is small. The only real lever is which text-to-speech engine you use and whether you cap volume. Voice rendering (TTS) is the whole cost; the reading text already exists from the chat, the sound bed is a pre-made file, and a small reformat pass is pennies.

The unit

A "field briefing" = a chat reading turned into spoken form, voice plus a light bed, no full hypnotic structure. Typical length ~1,000 words (~6 minutes of audio) ≈ ~6,000 characters. TTS APIs bill per character (or per UTF-8 byte, same thing for English).

Current per-briefing TTS cost (≈6,000 characters, verified June 2026)

Engine	Rate	Cost per 6k-char briefing	Notes
Fish Audio (s2-pro)	$15 / 1M UTF-8 bytes	~$0.09	Cheapest, commercial use OK, API-first
OpenAI tts-1	$15 / 1M chars	~$0.09	Same price, simplest integration
OpenAI tts-1-hd	$30 / 1M chars	~$0.18	Higher fidelity
ElevenLabs	~$0.12 - $0.30 / 1k chars (tier-dependent)	~$0.70 - $1.80	Premium voice, your Sacred Audio engine
Higgsfield Audio	credit-metered (~$5 / 100 top-up credits)	opaque, ~$0.10 - $0.50+	Bundled in a video suite, not API-first

Add a Haiku reformat pass (reading -> spoken script): ~$0.01. Add the bed mix: ~$0 (local, pre-rendered). So all-in per briefing: - BUDGET path (Fish Audio or OpenAI tts-1): ~$0.10 - PREMIUM path (ElevenLabs, your brand voice): ~$1

The three scenarios

JEFF + SOL (internal). Even 30 briefings a month on the premium voice is ~$30/mo; on the budget voice ~$3/mo. Noise. Just turn it on with the ElevenLabs voice. No reason to economize internally.
MEMBERSHIP TIERS (NOT everyone). Jeff's call 2026-06-20: do NOT offer it to all members, some users would rack up large bills. Correct, and the fix is a HARD cap enforced server-side, not trust. The API key is ours, so the bill is ours, so our system must enforce the limit. Three layers: (a) GATE the feature to chosen tiers only (keep it off the free/entry tier); (b) HARD monthly quota per member, enforced server-side, render refused once hit (not a soft warning); (c) per-member render counter. This makes maximum exposure a fixed, knowable number: members x cap x per-briefing cost. On the budget voice a 20/month cap is ~$2/member ceiling. Recommendation: launch for Jeff + Sol + the top tier only, hard-capped, budget voice; full Sacred Audio (ElevenLabs) reserved even higher.
DEV TEAM / TESTERS. Pure internal volume. Hundreds of test renders on the budget voice is single-digit dollars. Use the budget engine for automated/CI renders, premium only for final QA listens.

Delay

Not a real problem if the UX is right. TTS renders a ~6-minute audio in roughly 10 - 40 seconds (faster than realtime). Don't make it a live in-chat wait. Make it a BACKGROUND render that drops into a library/inbox tab with a "ready" ping. That removes the latency concern entirely.

Engine recommendation

ENGINE PATH (updated 2026-06-20: Jeff cancelled OpenAI, do not rely on it). START on ElevenLabs (already wired, no new account), hard-capped per user, to test usability now. THEN move the default read-aloud voice to a LOCAL open TTS (Kokoro / Piper, on the Mac mini) for ~$0 marginal cost on unlimited internal + capped-user use. Fish Audio (~$15/1M chars, needs an account + a little money) is the cheap hosted fallback. ElevenLabs stays the premium Sacred voice regardless. The casual "read it to me" mode does not need Sacred quality, so a local/cheap voice is fine.
KEEP ElevenLabs for full Sacred Audio (the locked demo voice and premium tiers). Do not change the demo.
HIGGSFIELD: it CAN do TTS. "Higgsfield Audio" is their text-to-speech with 21 presets plus voice cloning and video translation/lip-sync. That is the "protocol" it used to voice your script video: its own in-platform TTS, metered in credits. But it is built for video production, credit-priced and not API-first, so it is the wrong engine for a high-volume, programmatic, in-app briefing feature. Use Higgsfield where you are already in it making video; use a direct TTS API for C15.

Bottom line

Build it for all three. Use the budget TTS engine as the default field-briefing voice (~$0.10 each) with a soft per-month cap on member tiers, reserve ElevenLabs for full Sacred Audio, and deliver via a background render into a library tab so there is no wait. Internally for you and Sol it is effectively free; as a member feature the economics clearly work; for testers it is negligible.

Sources

ElevenLabs pricing: https://elevenlabs.io/pricing and https://texttolab.com/blog/elevenlabs-pricing
Fish Audio pricing/docs: https://docs.fish.audio/developer-guide/models-pricing/pricing-and-rate-limits and https://smallest.ai/blog/fish-audio-pricing-plans-api-billing-commercial-use-in-2026
OpenAI TTS pricing: https://texttolab.com/blog/openai-tts-pricing and https://costgoat.com/pricing/openai-tts
Higgsfield Audio + pricing: https://higgsfield.ai/blog/higgsfield-audio-ai-voice-tools and https://www.imagine.art/blogs/higgsfield-ai-pricing

Field Briefing Audio (C15) - cost memo (v1, 2026-06-20)

Field Briefing Audio (C15) - cost memo (v1, 2026-06-20)

The unit

Current per-briefing TTS cost (≈6,000 characters, verified June 2026)

The three scenarios

Delay

Engine recommendation

Bottom line

Sources

Related documents