Home / Audio / Production and Craft
C15 - Casual Read-Aloud / Field Briefing Audio (Hermes voices a chat conversation) - build brief (v1, 2026-06-20)
Updated Jun 20, 2026 · Affirmology_FieldBriefingAudio_C15_BuildBrief_v1.md
Summary. PRIORITY CORRECTION (Jeff 2026-06-20, "get it straight"): THIS doc is the LOW-priority, DEFERRED feature, a quick voiced readout of a chat conversation ("tell me about that abundance thing while I make lunch"). It is NOT the Sacred Audio render. Hermes renderi
C15 - Casual Read-Aloud / Field Briefing Audio (Hermes voices a chat conversation) - build brief (v1, 2026-06-20)
PRIORITY CORRECTION (Jeff 2026-06-20, "get it straight"): THIS doc is the LOW-priority, DEFERRED feature, a quick voiced readout of a chat conversation ("tell me about that abundance thing while I make lunch"). It is NOT the Sacred Audio render. Hermes rendering a FULL Sacred Audio from chat is C18, which is HIGH priority and uses the existing proven pipeline (no cheap-TTS cost analysis). Do not conflate the two. Build C18 first; this casual read-aloud waits.
Pairs with the cost memo (Affirmology_FieldBriefingAudio_CostMemo_v1.md) and the action layer (C16 push, C17 send-to-Studio). Jeff 2026-06-20: "Hermes should be able to deliver me a custom audio in this chatbot, and ask me if I want to make a Studio record of it as a shareable audio." Plus the lighter, primary use case: "tell me about that abundance thing you just researched while I'm cooking lunch." So the flow is: render -> play inline -> (optionally) offer to persist + share.
Two modes (the casual one leads)
- QUICK READ-ALOUD (default, primary). A plain voice readout of a chat message, hands-free, no sound bed, no save step. This is the "read me the conversation while I cook" mode. Low production bar: clear, pleasant, natural. NOT Sacred Audio (we already have that). This is what most uses will be.
- FIELD BRIEFING (optional upgrade). Voice + a light bed, and after delivery Hermes offers to save it to the Studio as a shareable record (via C17/C16). Use when the user wants a keepable, shareable piece, not a throwaway readout.
The flow
- TRIGGER (user-initiated): the user asks Hermes for an audio of a reading/briefing ("make me an audio of this", or taps a "Make this an audio" control on a Hermes message).
- RENDER (background): take the reading text, a light reformat into spoken script (Haiku), TTS it (engine per tier, see below), lay a light sound bed under it -> an MP3. Background job, ~10-40s for a few minutes of audio; show a "composing your audio" state, not a frozen wait.
- DELIVER INLINE: return the finished audio as an inline player inside the Hermes chat message. The user can listen right there.
- OFFER TO PERSIST + SHARE: right after delivery, Hermes asks "Want me to save this to the Studio as a shareable audio?" If yes -> create a Studio record via the C17
send_to_studio path (status saved/rendered, source "hermes", with the audio attached), and surface a share option: push it to a circle member (C16) and/or a shareable link (depends on C10, public media serving).
Engine + cost controls (from the cost memo, Jeff's constraints 2026-06-20)
- ENGINE PATH (Jeff 2026-06-20: OpenAI sub cancelled, do not rely on it). START on ElevenLabs (already wired into the engine, no new account), hard-capped per user, to test usability fast. THEN drop in a LOCAL open voice (Kokoro or Piper, runs on the Mac mini) to make internal + capped-user read-aloud effectively $0. Fish Audio is the cheap hosted fallback if we prefer not to self-host. The read-aloud mode does not need Sacred quality, so a good local/cheap voice is fine.
- NOT for everyone. Gate to chosen tiers (off the free/entry tier).
- HARD monthly cap per member, enforced SERVER-SIDE: a per-member render counter; refuse the render once the cap is hit (not a soft warning). Max exposure = members x cap x per-briefing cost, a fixed knowable number. Internally (Jeff + Sol) effectively uncapped/free.
API (api/main.py), bearer-auth
POST /api/hermes/audio body {as_person, source_text, voice_tier?} -> enforce tier gate + per-member monthly cap -> reformat -> TTS -> bed mix -> store MP3 (media store / R2) -> {ok, audio_url, audio_ref, remaining_quota}. Refuse with a clear message if the cap is hit or the tier is not entitled.
- Reuse the C17
POST /api/studio/draft (or a saved variant) to persist as a Studio record, attaching audio_ref.
make_audio tool {source_text, voice_tier?} -> calls the audio endpoint, returns the player + the "save to Studio?" offer. Fires only on an explicit user request. After delivering, Hermes asks the persist/share question (it does not auto-save or auto-share).
UI (web/app.js)
- A "Make this an audio" control on a Hermes message; an inline
<audio> player in the returned message; a follow-up prompt with "Save to Studio" / "Share" actions wired to C17 / C16.
- Clean copy, no em dashes.
Dependencies / sequencing
- Build AFTER C16 + C17 (the action layer): C15 reuses
send_to_studio (persist) and send_to_person (share-in-app).
- TRUE public shareable LINK depends on C10 (R2 media public serving, currently 403). In-app share (push to Sol) works without it; a public link does not. Flag C10 as the blocker for public links; in-app sharing ships first.
Guardrails
- USER-INITIATED ONLY (render, save, and share each require an explicit user yes; Hermes never auto-renders, auto-saves, or auto-shares).
- TIER-GATED + HARD-CAPPED server-side (cost protection is not optional).
- DEMO UNTOUCHED. Chart-driven preserved (the audio is from the person's own reading). No em dashes.
Acceptance test
- Ask Hermes for an audio of a reading. It renders in the background and returns an inline player that actually plays.
- Hermes then asks whether to save it to the Studio as a shareable audio. Saying yes creates a Studio record with the audio attached.
- The per-member cap: after the cap is hit, a further request is refused with a clear message; the counter resets next cycle.
- A non-entitled tier is refused the feature.