Home / Engine / Knowledge Corpus

Nemotron corpus upgrade - plan v1

Updated Jun 18, 2026 · Affirmology_Nemotron_CorpusUpgrade_Plan_v1.md

Summary. Verified 2026-06-15: NVIDIA Nemotron 3 Ultra is real (released June 4 2026, Computex), live on OpenRouter at nvidia/nemotron-3-ultra-550b-a55b (~$0.50 in / $2.50 out per million tokens) with a rate-limited :free variant. The briefing's facts are accurate.

Nemotron corpus upgrade - plan v1

Verified 2026-06-15: NVIDIA Nemotron 3 Ultra is real (released June 4 2026, Computex), live on OpenRouter at nvidia/nemotron-3-ultra-550b-a55b (~$0.50 in / $2.50 out per million tokens) with a rate-limited :free variant. The briefing's facts are accurate.

Why use it here

Corpus is at ~10,658 records, quality mid-7/10. The block to 10/10 is bulk structuring, the high-volume low-judgment work that was burning Claude credits. Nemotron does that for a fraction of the cost.

Routing (hybrid)

Nemotron (workhorse): structuring raw docs into records; dedup, tagging, first-pass QA; deep-chunk Western re-mine.
Claude / Haiku (keep): the quality observer that scores records, the morning report, and anything touching brand voice or customer-facing copy.
Nemotron is a SEPARATE API the corpus scripts call. Not the brain of Claude Code.

The backlog to clear (highest value first)

Numerology: 1% structured (7 of 507). Biggest easy win.
Human design: ~780 docs still unstructured (the YouTube haul, cut off by the old $4 cap).
Gene keys + vedic: finish remaining.
Western: raise --max-chunks-per-doc (e.g. 8 to 20) to mine the big books deeper.

Quality gate (do this BEFORE the overnight run)

Structure ~75 human-design docs with Nemotron, have the existing Claude/Haiku quality observer score them against Haiku-structured records. Proceed only if quality holds. Preserve the A/B tier wall (Tier C never used in generation).

Wiring (Claude Code)

Add an OpenRouter backend to the corpus structurer (OpenAI-compatible client, base URL https://openrouter.ai/api/v1, key from OPENROUTER_API_KEY, model nvidia/nemotron-3-ultra-550b-a55b), selectable with --backend openrouter. Keep --backend anthropic working.

Overnight run (after the gate passes)

caffeinate -i env PYTHONPATH=src OPENROUTER_API_KEY=... python3 -m affirmology.corpus.run \
  --data-dir /Volumes/Affirmology/corpus \
  --traditions numerology,human_design,gene_keys,vedic_astrology \
  --mode structure-only --backend openrouter \
  --model nvidia/nemotron-3-ultra-550b-a55b \
  --max-cost-usd 10

Then a Claude/Haiku quality + morning report pass. Fold this into the nightly launchd watchdog (already a pending todo) so it runs and reports automatically.

What Jeff does

Create an OpenRouter account, make an API key, add a few dollars of credit (or start on the :free variant to test). That is the only account step; the rest is Claude Code on the laptop.

Claude Code kickoff (paste this)

Read Affirmology_Nemotron_CorpusUpgrade_Plan_v1.md. Add an OpenRouter backend to the corpus structurer (OpenAI-compatible, base https://openrouter.ai/api/v1, key OPENROUTER_API_KEY, model nvidia/nemotron-3-ultra-550b-a55b) selectable via --backend openrouter, keeping --backend anthropic intact. Then run the QUALITY GATE: structure ~75 human_design docs with Nemotron and have our quality observer score them against the Haiku-structured records; show me the comparison. Do not run the full overnight pass until I see the gate result. Preserve the A/B tier wall.

Sources: model verified via OpenRouter and Artificial Analysis (see chat).

GATE RESULT - 2026-06-17 (Claude Code): VERDICT = HOLD

OpenRouter/Nemotron backend WIRED into the corpus structurer: --backend openrouter (OpenAI-compatible, base https://openrouter.ai/api/v1, key OPENROUTER_API_KEY, model via --model, default nvidia/nemotron-3-ultra-550b-a55b), with --backend anthropic/gemini intact. Code in src/affirmology/corpus/structurer.py (structure_document_openrouter, check_openrouter_ready, dispatch) + run.py (--backend openrouter, --model, preflight). Connectivity confirmed (paid model live via DeepInfra; key has credit).

GATE RUN: structured 75 human_design docs (tier B only, A/B wall enforced, 0 errors, $0.49, ~34 min) with Nemotron, then a Claude/Haiku judge scored 25 records per backend against the source text. - records/doc: Haiku 6.65 vs Nemotron 5.85 - mean quality: Haiku 6.76/10 vs Nemotron 4.88/10 (a real ~1.9-pt gap, not noise) - By eye, Nemotron records are thinner/fragmentary ("and caring (27 - 50)") and duplicate content across records; Haiku writes fuller self-contained interpretations. Side-by-side: Affirmology_Nemotron_GateComparison_v1.md (8 docs).

Multi-model bake-off 2026-06-17 (tuned prompt, read-only): ALL BELOW HAIKU

Re-ran the same 75-doc A/B gate against 3 stronger OpenRouter models with a TUNED extraction prompt (not Haiku's verbatim: demands complete self-contained interpretations, forbids fragments/duplication/verbatim quotes, JSON-only). Read-only on the corpus (in-memory candidates), so the corpus was never mutated. Haiku judge, n=25 each, judged vs source. Results: - Qwen3 235B (qwen/qwen3-235b-a22b-2507, $0.09/$0.10): quality 6.28/10 vs Haiku 7.24, $0.016/75 docs. Closest, but emits near-verbatim block quotes (e.g. "Book of Lines" lift) = voice + Tier-C risk. - DeepSeek V3.2 (deepseek/deepseek-v3.2, $0.23/$0.34): 6.44/10 vs Haiku 7.16, $0.044. - DeepSeek V4 Pro (deepseek/deepseek-v4-pro, $0.43/$0.87): 5.32/10 vs Haiku 7.16, $0.183. Most expensive AND worst. Candidates produce MORE records/doc but LOWER quality (verbose, less distilled). 6-50x cheaper than Haiku, but the corpus grounds every audio so the ~0.7-1.9 pt drop fails the bar. DECISION: keep Haiku for structuring; revisit with a stricter anti-verbatim prompt or a future model. Side-by-sides: Affirmology_GateCompare_{qwen3-235b,deepseek-v3.2,deepseek-v4-pro}_v1.md. The --backend openrouter wiring stays for future re-tests.

"Remaining backlog" is a MYTH - it is non-structurable residue (2026-06-17)

Projected Haiku cost to structure the full no-records backlog (707 A/B docs with text, >=100 words) was ~$3.97. Ran it with a $6 hard cap. STOPPED at 375/707 because it produced 0 records for ~$2.88. Inspection of the no-record docs across traditions showed why: they are not pending content, they are residue the corpus build already correctly skipped: - numerology "backlog" = BINARY GARBAGE (failed PDF extractions, mojibake, bogus word_counts). - human_design = an I Ching book, coaches' marketing stats pages, 85k words of raw archive.org HTML. - western = a parapsychology research paper (83k words), Ptolemy front-matter, a Theosophical library catalog. The 12,905 structured records ARE the corpus; the leftover docs genuinely yield nothing, so re-structuring them only burns input tokens (especially the huge off-topic docs). The old "numerology 1% structured / ~780 HD unstructured" framing counts these junk/failed-extraction docs as if they were pending real content - they are not. DECISION: do NOT re-run structuring on the no-records backlog. To GROW the corpus, scrape NEW high-quality sources (then structure those), or re-mine the big real books deeper with --redo-min-words; do not reprocess the residue. The aborted run added 0 records and never reached its re-upload step, so local + R2 corpus.db are both unchanged (12,905, 9.4MB).

VERDICT: HOLD - did NOT run the overnight pass. Nemotron's structuring quality is below Haiku's for this work, so the credit-saving swap is not worth the quality drop right now. The 444 gate/pilot nemotron-tagged records were DELETED from the corpus (restored to 12,905). The backend wiring stays in place for future re-evaluation (e.g., a better prompt for Nemotron, or a different model). To retry the gate later: cd affirmology-agent && PYTHONPATH=src python3 /tmp/affm_build/nemotron_gate.py --limit 75 (note: the gate script lives in the build tmp; move it into scripts/ if it should persist).

Nemotron corpus upgrade - plan v1

Nemotron corpus upgrade - plan v1

Why use it here

Routing (hybrid)

The backlog to clear (highest value first)

Quality gate (do this BEFORE the overnight run)

Wiring (Claude Code)

Overnight run (after the gate passes)

What Jeff does

Claude Code kickoff (paste this)

GATE RESULT - 2026-06-17 (Claude Code): VERDICT = HOLD

Multi-model bake-off 2026-06-17 (tuned prompt, read-only): ALL BELOW HAIKU

"Remaining backlog" is a MYTH - it is non-structurable residue (2026-06-17)

Related documents