Home / Engine / Knowledge Corpus
Updated Jun 18, 2026 · Affirmology_Corpus_Diagnosis_And_Sprint_v1.md
Measured directly from corpus.db (12,940 records, 3,866 docs, 88 sources, all tier A/B, tier wall intact). The honest read: the headline counts hid the real problems. One tradition is strong, one is corrupted, four are thin, and depth is missing almost everywhere.
| Tradition | Records | Distinct elements | Depth (recs/elem) | Median text | Verdict |
|---|---|---|---|---|---|
| human_design | 7,863 | 1,386 | 5.7 | 288 | Strong (over-weighted) |
| transits | 2,525 | 1,984 | 1.3 | 260 | Broad but shallow + dated noise |
| gene_keys | 1,531 | 1,111 | 1.4 | 275 | CORRUPTED - not actually Gene Keys |
| western_astrology | 558 | 395 | 1.4 | 189 | Thin (and it's the most-used) |
| vedic_astrology | 362 | 328 | 1.1 | 164 | Thin + shallow |
| numerology | 101 | 72 | 1.4 | 208 | Very thin + quality noise |
Three problems:
1. Gene Keys is mislabeled. Records tagged gene_keys carry element_types western_astrology (836), transit (362), planet_in_aspect, planet_in_house, sample keys are Pallas_in_3rd_House, Pluto_Return, Lilith_Conjunct_Descendant. Real Gene Keys content (64 keys × shadow/gift/siddhi, the lines, the sequences) is essentially absent. A core system is effectively missing.
2. Western (most-used) is thin: 395 distinct elements, ~1 source each, short text. Vedic and numerology worse; numerology has invalid keys (Path_23, Number_32) indicating mis-extraction.
3. No depth outside HD: only Human Design has multi-source richness (5.7/elem); everything else ~1.0 - 1.4 (single-source). Plus dated event noise (e.g. a 1690 conjunction, dated 2014 transits) that can't ground a personal audio.
A canonical element map per tradition, each element covered by 2 - 3 quality sources, with full (not 1 - 2 sentence) interpretations, and no dated/event noise: - Western: ~14 bodies (planets + Asc/MC/Nodes/Chiron) × 12 signs (~168), × 12 houses (~168), the major aspect pairs, sign generals, house generals. Evergreen only. - Vedic: 9 grahas × 12 rasis (108) and × 12 bhavas (108), 27 nakshatras, the major yogas, Vimshottari dasha periods. - Gene Keys: 64 keys × shadow/gift/siddhi (192), the 6 lines, and the sequence spheres (Activation, Venus, Pearl). - Numerology: Life Path 1 - 9 + 11/22/33, Expression, Soul Urge, Personality, Personal Year, Karmic Debt (13/14/16/19), Pinnacles/Challenges. - Human Design: already strong; maintain, prune any noise. A 10/10 is high coverage of these canonical sets with real depth. That is an ongoing build (it's exactly what the nightly self-improving crawler is for), not a literal one-weekend finish, so let's be honest about the weekend goal below.
Haiku structuring is cheap (~$4 for hundreds of docs), so the limiter is good source material, not budget. And we just found the material: a recovery pass restored 1,652 unstructured docs from compressed / failed-extraction state to clean readable text, concentrated exactly in the thin traditions, numerology 492, vedic 384, western 372, human_design 263, gene_keys 141 (0 raw files missing, DB backed up, reversible). That "residue" wasn't junk, it was locked-up source text. It is the fuel for this sprint. Order:
gene_keys records are astrology: re-tag the mislabeled ones to their true tradition, then source real Gene Keys interpretive material (the 64 keys' shadow/gift/siddhi and lines) and structure it properly. Turns a broken core system into a real one.Path_23) so counts reflect evergreen, usable interpretations.Honest expectation: by Sunday this can fix the Gene Keys corruption, give us a true coverage map, prune the noise, and meaningfully fill Western + Numerology + Gene Keys, a big jump from "lopsided with a broken tradition" toward balanced and solid. Reaching a sustained 10/10 across all canonical elements with full depth is the nightly crawler's continuing job; the weekend gets us most of the way and onto an honest, measurable track.
Separate from chart interpretations, build and continually enrich a techniques + craft corpus: - Structure the existing neuroscience/meditation research docs into a "techniques" tradition the engine retrieves like any other (the Techniques Library to-do), so Chiron pulls from structured records, not just the doc. - Mine the best visualization, meditation, and journeying audios (YouTube especially, via the listening engine) for their openers, closers, and imagery patterns, what the strongest ones actually do at the start, the turn, and the close, and catalog those as craft references for Orpheus and Chiron. - Keep enriching it nightly alongside the corpus crawl: new techniques, new opener/closer/imagery patterns, tagged and quality-gated.
After the sprint, the gap report should show, per tradition: canonical elements, % covered, avg depth, and noise removed. That number is the honest corpus grade, and it's what we drive up over time.