Home / Engine / Knowledge Corpus
Updated Jun 13, 2026 · Affirmology_CorpusStatus_2026-06-13_v1.md
Snapshot after the overnight chained run (overnight_corpus.sh, run 03:36).
Total structured records: 10,658 (up from ~3,760 the night before). The scrape-plus-structure chain ran unattended end to end and tripled the record count.
| tradition | docs | words | structured docs | structured % |
|---|---|---|---|---|
| western_astrology | 550 | 3,610,433 | 34 | 6% |
| human_design | 1,487 | 1,571,994 | 706 | 47% |
| transits | 382 | 1,082,484 | 369 | 97% |
| vedic_astrology | 527 | 878,743 | 42 | 8% |
| gene_keys | 413 | 397,906 | 186 | 45% |
| numerology | 507 | 194,975 | 7 | 1% |
| total | 3,866 | 7,736,545 | 1,344 | 35% |
The per-run costs tell the true story, which is different from a first glance at the scorecard:
Total spend across all runs is about $19, so roughly $9-10 of credit remains (confirm in the Console).
So the highest-value, in-budget move is finishing the human design backlog, not pouring money into Western.
Western's big archive.org books are structured only to a depth of 8 chunks each (~88K chars), so roughly the first 30% of each book is mined and the rest is untouched. This is a chunk-depth setting (--max-chunks-per-doc), not a budget wall. Mining the books deeper is a quality refinement to do later, with more credit.
The lagging dimension is structuring_progress (35% of docs structured), dragged mostly by the human design backlog. Tradition balance, voice diversity, and coverage all improved. Overall corpus quality is roughly mid-7 out of 10, up from about 6.5.
bash
caffeinate -i env PYTHONPATH=src python3 -m affirmology.corpus.run \
--data-dir /Volumes/Affirmology/corpus \
--traditions gene_keys,human_design --mode structure-only --backend anthropic \
--max-cost-usd 9--max-chunks-per-doc (e.g. to 20).