Home / Engine / Knowledge Corpus

Affirmology Corpus Status - 2026-06-13

Updated Jun 13, 2026 · Affirmology_CorpusStatus_2026-06-13_v1.md

Summary. Snapshot after the overnight chained run (overnightcorpus.sh, run 03:36).

Affirmology Corpus Status - 2026-06-13

Snapshot after the overnight chained run (overnight_corpus.sh, run 03:36).

Headline

Total structured records: 10,658 (up from ~3,760 the night before). The scrape-plus-structure chain ran unattended end to end and tripled the record count.

Scorecard

tradition docs words structured docs structured %
western_astrology 550 3,610,433 34 6%
human_design 1,487 1,571,994 706 47%
transits 382 1,082,484 369 97%
vedic_astrology 527 878,743 42 8%
gene_keys 413 397,906 186 45%
numerology 507 194,975 7 1%
total 3,866 7,736,545 1,344 35%

What went well

The real bottleneck (corrected from run-table data)

The per-run costs tell the true story, which is different from a first glance at the scorecard:

Total spend across all runs is about $19, so roughly $9-10 of credit remains (confirm in the Console).

So the highest-value, in-budget move is finishing the human design backlog, not pouring money into Western.

The subtler Western point

Western's big archive.org books are structured only to a depth of 8 chunks each (~88K chars), so roughly the first 30% of each book is mined and the rest is untouched. This is a chunk-depth setting (--max-chunks-per-doc), not a budget wall. Mining the books deeper is a quality refinement to do later, with more credit.

Estimated quality

The lagging dimension is structuring_progress (35% of docs structured), dragged mostly by the human design backlog. Tradition balance, voice diversity, and coverage all improved. Overall corpus quality is roughly mid-7 out of 10, up from about 6.5.

Next steps

  1. Finish the human design + gene keys backlog (in budget, ~$5-9): bash caffeinate -i env PYTHONPATH=src python3 -m affirmology.corpus.run \ --data-dir /Volumes/Affirmology/corpus \ --traditions gene_keys,human_design --mode structure-only --backend anthropic \ --max-cost-usd 9
  2. Later, with more credit: mine the Western/vedic/numerology books deeper by raising --max-chunks-per-doc (e.g. to 20).
  3. Build the nightly launchd watchdog so this runs and reports automatically.