Home / Engine / Knowledge Corpus

Affirmology Corpus - Diagnosis & Weekend Sprint to "Strong" v1

Updated Jun 18, 2026 · Affirmology_Corpus_Diagnosis_And_Sprint_v1.md

Summary. Measured directly from corpus.db (12,940 records, 3,866 docs, 88 sources, all tier A/B, tier wall intact). The honest read: the headline counts hid the real problems. One tradition is strong, one is corrupted, four are thin, and depth is missing almost everywh

Affirmology Corpus - Diagnosis & Weekend Sprint to "Strong" v1

Measured directly from corpus.db (12,940 records, 3,866 docs, 88 sources, all tier A/B, tier wall intact). The honest read: the headline counts hid the real problems. One tradition is strong, one is corrupted, four are thin, and depth is missing almost everywhere.

The diagnosis (real numbers)

Tradition Records Distinct elements Depth (recs/elem) Median text Verdict
human_design 7,863 1,386 5.7 288 Strong (over-weighted)
transits 2,525 1,984 1.3 260 Broad but shallow + dated noise
gene_keys 1,531 1,111 1.4 275 CORRUPTED - not actually Gene Keys
western_astrology 558 395 1.4 189 Thin (and it's the most-used)
vedic_astrology 362 328 1.1 164 Thin + shallow
numerology 101 72 1.4 208 Very thin + quality noise

Three problems: 1. Gene Keys is mislabeled. Records tagged gene_keys carry element_types western_astrology (836), transit (362), planet_in_aspect, planet_in_house, sample keys are Pallas_in_3rd_House, Pluto_Return, Lilith_Conjunct_Descendant. Real Gene Keys content (64 keys × shadow/gift/siddhi, the lines, the sequences) is essentially absent. A core system is effectively missing. 2. Western (most-used) is thin: 395 distinct elements, ~1 source each, short text. Vedic and numerology worse; numerology has invalid keys (Path_23, Number_32) indicating mis-extraction. 3. No depth outside HD: only Human Design has multi-source richness (5.7/elem); everything else ~1.0 - 1.4 (single-source). Plus dated event noise (e.g. a 1690 conjunction, dated 2014 transits) that can't ground a personal audio.

What "10/10" actually means (the target, so we can measure honestly)

A canonical element map per tradition, each element covered by 2 - 3 quality sources, with full (not 1 - 2 sentence) interpretations, and no dated/event noise: - Western: ~14 bodies (planets + Asc/MC/Nodes/Chiron) × 12 signs (~168), × 12 houses (~168), the major aspect pairs, sign generals, house generals. Evergreen only. - Vedic: 9 grahas × 12 rasis (108) and × 12 bhavas (108), 27 nakshatras, the major yogas, Vimshottari dasha periods. - Gene Keys: 64 keys × shadow/gift/siddhi (192), the 6 lines, and the sequence spheres (Activation, Venus, Pearl). - Numerology: Life Path 1 - 9 + 11/22/33, Expression, Soul Urge, Personality, Personal Year, Karmic Debt (13/14/16/19), Pinnacles/Challenges. - Human Design: already strong; maintain, prune any noise. A 10/10 is high coverage of these canonical sets with real depth. That is an ongoing build (it's exactly what the nightly self-improving crawler is for), not a literal one-weekend finish, so let's be honest about the weekend goal below.

The weekend sprint (realistic: take it from weak/lopsided to clean, balanced, and genuinely solid)

Haiku structuring is cheap (~$4 for hundreds of docs), so the limiter is good source material, not budget. And we just found the material: a recovery pass restored 1,652 unstructured docs from compressed / failed-extraction state to clean readable text, concentrated exactly in the thin traditions, numerology 492, vedic 384, western 372, human_design 263, gene_keys 141 (0 raw files missing, DB backed up, reversible). That "residue" wasn't junk, it was locked-up source text. It is the fuel for this sprint. Order:

  1. Fix Gene Keys (highest impact). Investigate why gene_keys records are astrology: re-tag the mislabeled ones to their true tradition, then source real Gene Keys interpretive material (the 64 keys' shadow/gift/siddhi and lines) and structure it properly. Turns a broken core system into a real one.
  2. Build the canonical coverage map + gap report. Encode the target element sets above; have status.py report coverage % and depth per tradition against them. Now "weak" is measurable, not a vibe.
  3. Prune the noise. Quarantine dated/event-specific records (historical conjunctions, year-stamped transits) and invalid keys (Path_23) so counts reflect evergreen, usable interpretations.
  4. Source + structure the thin traditions to fill gaps, prioritized: Western (most-used) → Gene Keys → Numerology → Vedic. Aim crawling at the specific missing elements from the gap report, using quality public-domain and reputable open sources.
  5. Add depth where it's thinnest. Target 2 - 3 sources per high-value element (especially Western core: luminaries, Asc, personal planets in signs/houses) so interpretations cross-reference instead of relying on one scrape.
  6. Lengthen/enrich the short records on the most-used elements so they actually carry an audio.

Honest expectation: by Sunday this can fix the Gene Keys corruption, give us a true coverage map, prune the noise, and meaningfully fill Western + Numerology + Gene Keys, a big jump from "lopsided with a broken tradition" toward balanced and solid. Reaching a sustained 10/10 across all canonical elements with full depth is the nightly crawler's continuing job; the weekend gets us most of the way and onto an honest, measurable track.

The techniques / craft knowledge base (the second ask, ongoing)

Separate from chart interpretations, build and continually enrich a techniques + craft corpus: - Structure the existing neuroscience/meditation research docs into a "techniques" tradition the engine retrieves like any other (the Techniques Library to-do), so Chiron pulls from structured records, not just the doc. - Mine the best visualization, meditation, and journeying audios (YouTube especially, via the listening engine) for their openers, closers, and imagery patterns, what the strongest ones actually do at the start, the turn, and the close, and catalog those as craft references for Orpheus and Chiron. - Keep enriching it nightly alongside the corpus crawl: new techniques, new opener/closer/imagery patterns, tagged and quality-gated.

How to verify (no vibes)

After the sprint, the gap report should show, per tradition: canonical elements, % covered, avg depth, and noise removed. That number is the honest corpus grade, and it's what we drive up over time.