Home / Engine / Knowledge Corpus
Updated Jun 18, 2026 · Affirmology_Nemotron_CorpusUpgrade_Plan_v1.md
Verified 2026-06-15: NVIDIA Nemotron 3 Ultra is real (released June 4 2026, Computex), live on OpenRouter at nvidia/nemotron-3-ultra-550b-a55b (~$0.50 in / $2.50 out per million tokens) with a rate-limited :free variant. The briefing's facts are accurate.
Corpus is at ~10,658 records, quality mid-7/10. The block to 10/10 is bulk structuring, the high-volume low-judgment work that was burning Claude credits. Nemotron does that for a fraction of the cost.
--max-chunks-per-doc (e.g. 8 to 20) to mine the big books deeper.Structure ~75 human-design docs with Nemotron, have the existing Claude/Haiku quality observer score them against Haiku-structured records. Proceed only if quality holds. Preserve the A/B tier wall (Tier C never used in generation).
Add an OpenRouter backend to the corpus structurer (OpenAI-compatible client, base URL https://openrouter.ai/api/v1, key from OPENROUTER_API_KEY, model nvidia/nemotron-3-ultra-550b-a55b), selectable with --backend openrouter. Keep --backend anthropic working.
caffeinate -i env PYTHONPATH=src OPENROUTER_API_KEY=... python3 -m affirmology.corpus.run \
--data-dir /Volumes/Affirmology/corpus \
--traditions numerology,human_design,gene_keys,vedic_astrology \
--mode structure-only --backend openrouter \
--model nvidia/nemotron-3-ultra-550b-a55b \
--max-cost-usd 10
Then a Claude/Haiku quality + morning report pass. Fold this into the nightly launchd watchdog (already a pending todo) so it runs and reports automatically.
Create an OpenRouter account, make an API key, add a few dollars of credit (or start on the :free variant to test). That is the only account step; the rest is Claude Code on the laptop.
Read Affirmology_Nemotron_CorpusUpgrade_Plan_v1.md. Add an OpenRouter backend to the corpus structurer (OpenAI-compatible, base https://openrouter.ai/api/v1, key OPENROUTER_API_KEY, model nvidia/nemotron-3-ultra-550b-a55b) selectable via --backend openrouter, keeping --backend anthropic intact. Then run the QUALITY GATE: structure ~75 human_design docs with Nemotron and have our quality observer score them against the Haiku-structured records; show me the comparison. Do not run the full overnight pass until I see the gate result. Preserve the A/B tier wall.
Sources: model verified via OpenRouter and Artificial Analysis (see chat).
OpenRouter/Nemotron backend WIRED into the corpus structurer: --backend openrouter (OpenAI-compatible, base https://openrouter.ai/api/v1, key OPENROUTER_API_KEY, model via --model, default nvidia/nemotron-3-ultra-550b-a55b), with --backend anthropic/gemini intact. Code in src/affirmology/corpus/structurer.py (structure_document_openrouter, check_openrouter_ready, dispatch) + run.py (--backend openrouter, --model, preflight). Connectivity confirmed (paid model live via DeepInfra; key has credit).
GATE RUN: structured 75 human_design docs (tier B only, A/B wall enforced, 0 errors, $0.49, ~34 min) with Nemotron, then a Claude/Haiku judge scored 25 records per backend against the source text.
- records/doc: Haiku 6.65 vs Nemotron 5.85
- mean quality: Haiku 6.76/10 vs Nemotron 4.88/10 (a real ~1.9-pt gap, not noise)
- By eye, Nemotron records are thinner/fragmentary ("and caring (27 - 50)") and duplicate content across records; Haiku writes fuller self-contained interpretations. Side-by-side: Affirmology_Nemotron_GateComparison_v1.md (8 docs).
Re-ran the same 75-doc A/B gate against 3 stronger OpenRouter models with a TUNED extraction prompt (not Haiku's verbatim: demands complete self-contained interpretations, forbids fragments/duplication/verbatim quotes, JSON-only). Read-only on the corpus (in-memory candidates), so the corpus was never mutated. Haiku judge, n=25 each, judged vs source. Results:
- Qwen3 235B (qwen/qwen3-235b-a22b-2507, $0.09/$0.10): quality 6.28/10 vs Haiku 7.24, $0.016/75 docs. Closest, but emits near-verbatim block quotes (e.g. "Book of Lines" lift) = voice + Tier-C risk.
- DeepSeek V3.2 (deepseek/deepseek-v3.2, $0.23/$0.34): 6.44/10 vs Haiku 7.16, $0.044.
- DeepSeek V4 Pro (deepseek/deepseek-v4-pro, $0.43/$0.87): 5.32/10 vs Haiku 7.16, $0.183. Most expensive AND worst.
Candidates produce MORE records/doc but LOWER quality (verbose, less distilled). 6-50x cheaper than Haiku, but the corpus grounds every audio so the ~0.7-1.9 pt drop fails the bar. DECISION: keep Haiku for structuring; revisit with a stricter anti-verbatim prompt or a future model. Side-by-sides: Affirmology_GateCompare_{qwen3-235b,deepseek-v3.2,deepseek-v4-pro}_v1.md. The --backend openrouter wiring stays for future re-tests.
Projected Haiku cost to structure the full no-records backlog (707 A/B docs with text, >=100 words) was ~$3.97. Ran it with a $6 hard cap. STOPPED at 375/707 because it produced 0 records for ~$2.88. Inspection of the no-record docs across traditions showed why: they are not pending content, they are residue the corpus build already correctly skipped: - numerology "backlog" = BINARY GARBAGE (failed PDF extractions, mojibake, bogus word_counts). - human_design = an I Ching book, coaches' marketing stats pages, 85k words of raw archive.org HTML. - western = a parapsychology research paper (83k words), Ptolemy front-matter, a Theosophical library catalog. The 12,905 structured records ARE the corpus; the leftover docs genuinely yield nothing, so re-structuring them only burns input tokens (especially the huge off-topic docs). The old "numerology 1% structured / ~780 HD unstructured" framing counts these junk/failed-extraction docs as if they were pending real content - they are not. DECISION: do NOT re-run structuring on the no-records backlog. To GROW the corpus, scrape NEW high-quality sources (then structure those), or re-mine the big real books deeper with --redo-min-words; do not reprocess the residue. The aborted run added 0 records and never reached its re-upload step, so local + R2 corpus.db are both unchanged (12,905, 9.4MB).
VERDICT: HOLD - did NOT run the overnight pass. Nemotron's structuring quality is below Haiku's for this work, so the credit-saving swap is not worth the quality drop right now. The 444 gate/pilot nemotron-tagged records were DELETED from the corpus (restored to 12,905). The backend wiring stays in place for future re-evaluation (e.g., a better prompt for Nemotron, or a different model). To retry the gate later: cd affirmology-agent && PYTHONPATH=src python3 /tmp/affm_build/nemotron_gate.py --limit 75 (note: the gate script lives in the build tmp; move it into scripts/ if it should persist).