The chorus-* Commands — AI-Assisted Workflow Reference
The nine chorus-* commands form a complete pipeline for turning a normative
corpus (PDF, plain text, Word, Excel) into a running Perl inference engine that
validates real projects.
They are AI agent commands — not Perl modules or shell scripts. Each is a skill loaded by an AI agent (Claude, Copilot, ECA…) and executed interactively in the development environment.
The AI agent is not an execution dependency. The Perl pipeline generated by the chain runs entirely on its own, on any machine with Perl installed, without an AI agent and without a network connection.
The AI agent is a project dependency. To adapt a sandbox to a new project — aligning
engineer documents with KB slots and producing a valid project JSON — you need
chorus-create-project or chorus-import-project, both AI agent skills. The LLM
reads the KB and bridges the terminology gap that no static script can cover
generically. An AI agent is also needed when the normative corpus changes.
The complete pipeline at a glance
┌─────────────────────────────────┐
│ Normative corpus (PDF, text…) │
└──────────────┬──────────────────┘
│
chorus-pdf (if PDF)
chorus-word (if .docx)
chorus-excel (if .xlsx / .csv)
│
▼
┌─────────────────────────────────┐
│ corpus/<NNN>-<slug>-text.txt │
│ corpus/<NNN>-<slug>-vision.md │
└──────────────┬──────────────────┘
│
chorus-feed
│
▼
┌─────────────────────────────────┐
│ agent/agents/<slug>.org (KB) │
│ rules/<slug>/R<NN>-xxx.yml │
│ lib/…/Agent/<Slug>/Helpers.pm │
└──────────────┬──────────────────┘
│ ← domain expert reviews, corrects
│
chorus-check
│
▼
┌─────────────────────────────────┐
│ Feed.pm · Agent/*.pm │
│ Expert.pm · run.pl │
└──────────────┬──────────────────┘
│
perl run.pl project.json
│
▼
✅ COMPLIANT / ❌ NON_COMPLIANT
with reason, per element, per agent
│
chorus-strengthen
│
▼
┌─────────────────────────────────┐
│ gap report + enrichment roadmap│
└──────────────┬──────────────────┘
│
chorus-feed --enrich (targeted fixes)
└──────────────────┐
│ reinforcement loop
chorus-check --all ✅
The project file can be written by hand, generated from the KB with
chorus-create-project, or aligned from engineer documents with
chorus-import-project. Once a project file exists, chorus-strengthen
can identify gaps in the YAML rules and recommend enrichment corpora.
chorus-quickstart — Pipeline overview
chorus-quickstart
Single responsibility: display the complete pipeline from a raw corpus to a compliance report, with the two available paths and their decision fork.
This command does not execute anything — it is a guided reference showing:
- Path A (real project) —
chorus-feed→chorus-import-project→chorus-check - Path B (synthetic coverage) —
chorus-feed→chorus-create-project→chorus-check - When to use
chorus-import-projectvschorus-create-project - The reinforcement loop via
chorus-strengthen - The sandbox directory layout after a full run
- A quick command cheat-sheet for both paths
Start here if you are new to Chorus or unsure which path to follow.
chorus-pdf — Extract a PDF corpus
chorus-pdf <sandbox-name> <file.pdf> [--out <slug>] [--hybrid] [--auto] [--images] [--batch]
Single responsibility: produce an enriched text file from a PDF.
Standard PDF-to-text tools silently drop normative tables rendered as images,
multi-column layouts, and figure annotations. chorus-pdf recovers them.
Extraction modes
| Mode | Flag | Engine | API key | Output |
|---|---|---|---|---|
| Hybrid (default) | (none — auto-detected) | pdfminer text on ALL pages + Claude vision on cropped figures | ✅ ANTHROPIC_API_KEY | <slug>-vision.md |
| Text (fallback) | (none — no API key) | pdfminer.six only | ❌ not required | <slug>-text.txt |
| Auto | --auto | pdfminer (text pages) + LLM vision (figure pages) | ✅ | <slug>-vision.md |
| Images | --images | pdftoppm 150 DPI + LLM vision on all pages | ✅ | <slug>-vision.md |
Choosing a mode:
No flag provided
→ Phase 0.0 auto-detects ANTHROPIC_API_KEY
→ if key valid : --hybrid activated automatically ← DEFAULT
→ if key absent or invalid : text mode (fallback)
API key available, mixed document (text + embedded figures)
→ (default — hybrid activated automatically)
API key available, text-dominant document (few or no embedded figures)
→ --auto ← faster, fewer API calls
API key available, mostly diagrams or scanned PDF
→ --images
No API key available
→ (default text mode — forced fallback)
--auto classifies each page first (pdfminer on text-only pages, vision on
pages with figures), minimising API calls to pages that actually need them.
Output
corpus/<NNN>-<slug>-text.txt or corpus/<NNN>-<slug>-vision.md
(numbered in sequence with existing corpus files)
Prerequisites
pip install pdfminer.six pypdf
sudo apt install poppler-utils # for --auto and --images
export ANTHROPIC_API_KEY="sk-ant-..." # for --auto and --images
Next step
chorus-feed <sandbox-name> corpus/<NNN>-<slug>-text.txt
(or: corpus/<NNN>-<slug>-vision.md)
chorus-word — Extract a Word document
chorus-word <sandbox-name> <file.docx> [--out <slug>] [--batch]
Single responsibility: produce an enriched text file from a Word document (.docx).
Standard Word-to-text converters silently drop embedded images, merged cells, and
the actual reading order. chorus-word preserves them.
Extraction modes
| Mode | Engine | API key | Images | Tables | Output |
|---|---|---|---|---|---|
| Hybrid (default) | python-docx text + Claude vision on images | ✅ ANTHROPIC_API_KEY | ✅ described | ✅ Markdown pipe | <slug>-vision.md |
| Text (fallback) | python-docx only | ❌ not required | [IMAGE — not extracted] placeholder | ✅ Markdown pipe | <slug>-text.txt |
Mode is auto-detected: hybrid if the API key is present and valid, text otherwise.
Prerequisites
pip install python-docx
export ANTHROPIC_API_KEY="sk-ant-..." # for hybrid mode
Next step
chorus-feed <sandbox-name> corpus/<NNN>-<slug>-vision.md
(or: corpus/<NNN>-<slug>-text.txt)
chorus-excel — Extract an Excel spreadsheet or CSV
chorus-excel <sandbox-name> <file.xlsx|file.csv> [--out <slug>] [--sheet <name>] [--batch]
Single responsibility: produce an enriched text file from an Excel spreadsheet (.xlsx)
or CSV file. Naive conversions flatten merged cells, ignore embedded images, and do not
describe charts. chorus-excel recovers them.
Extraction modes
| Mode | Format | Engine | API key | Images / Charts | Output |
|---|---|---|---|---|---|
| Hybrid (default) | .xlsx | openpyxl + Claude vision | ✅ ANTHROPIC_API_KEY | ✅ described | <slug>-vision.md |
| Text (fallback) | .xlsx | openpyxl only | ❌ not required | [IMAGE/CHART — not extracted] | <slug>-text.txt |
| CSV | .csv | csv.reader | ❌ | N/A | <slug>-text.txt |
Mode is auto-detected from the file extension and API key availability.
Prerequisites
pip install openpyxl
sudo apt install libreoffice # for charts in hybrid mode
export ANTHROPIC_API_KEY="sk-ant-..." # for hybrid mode
Next step
chorus-feed <sandbox-name> corpus/<NNN>-<slug>-vision.md
(or: corpus/<NNN>-<slug>-text.txt)
chorus-feed — Build the knowledge base
chorus-feed <sandbox-name> <corpus> [--enrich]
Single responsibility: extract knowledge from a corpus and write it into structured KB files. Does not generate any Perl infrastructure.
<corpus> must be a plain-text (.txt) or Markdown (.md) file — never a PDF.
If a PDF is provided, chorus-feed stops and suggests running chorus-pdf first.
Two modes
Mode A — Initialization (default, no flag)
Used for a new sandbox or a fresh start. Creates the full sandbox structure:
<sandbox-name>/
corpus/001-<slug>.txt ← the corpus
agent/agents/<slug>.org ← KB per agent (ontology, slots, rules, helpers)
agent/agents/index.org ← pipeline index
rules/<slug>/R<NN>-xxx.yml ← YAML inference rules
lib/…/Agent/<Slug>/Helpers.pm ← normative tables (extracted from corpus)
README.org
What the AI agent produces per agent:
- Slot ontology — the Frame types and slot dictionary for the domain
- YAML rules — one file per rule, named
R<NN>-<slug>.yml(loaded alphabetically) Helpers.pm— normative lookup tables and calculations, annotated with their corpus source (# §4.2 EC5 — Bending resistance by timber class)
Mode B — Incremental enrichment (--enrich required)
Used when the sandbox already has a KB and new normative material has arrived. The AI agent reads the existing KB, classifies each new rule as refinement, extension, or new domain, and applies targeted changes.
chorus-feed <sandbox-name> new-addendum.txt --enrich
What chorus-feed does NOT do
It never generates Feed.pm, Agent/*.pm, Expert.pm, or run.pl.
Those are the responsibility of chorus-check.
Key design decisions embedded in the KB
- Targeting strategy — how each agent's
_SCOPEfinds its Frames (fmatch+ presence slot for large volumes; discriminating slot + filter for small ones) - Idempotence — every YAML rule that writes a slot carries
EXCEPTION: defined $var->{slot}to prevent re-firing _MAX_CYCLESsizing — documented per agent, calibrated toN_frames × N_rules × N_agents × 10- Normative traceability — every threshold in
Helpers.pmis annotated with its corpus reference
Next step
chorus-check <sandbox-name> project.json
Or, to review what was generated before running:
# Open the KB in your editor
agent/agents/<slug>.org
chorus-check — Generate infrastructure and run
chorus-check <sandbox-name> <project-file.json> [--all]
Single responsibility: read the KB, generate the Perl infrastructure, run the pipeline against the project file, and produce a conformity report.
--all runs every projet-*.json file found in the sandbox in one pass
and produces a synthesis table (see below). The fast path applies: the
infrastructure is checked once and reused for every project file.
Smart regeneration
chorus-check keeps a hash of the KB files (agent/.kb-hash). On each call:
- KB unchanged → skips all generation, runs
perl run.pldirectly (fast path) - KB changed (after a
chorus-feed --enrich) → regenerates the infrastructure, then runs - No infrastructure yet → generates from scratch
This means running chorus-check twice on the same sandbox with different
project files costs almost nothing on the second call.
What gets generated
| File | Role |
|---|---|
| lib/<NS>/Feed.pm | Loads the project JSON, creates Frames, sets targeting slots |
| lib/<NS>/Agent/<Slug>.pm | Shell for each agent: imports Helpers, loads YAML rules |
| lib/<NS>/Expert.pm | Wires all agents, sets _MAX_CYCLES, registers with Expert |
| run.pl | Entry point: perl run.pl project.json |
The generated code is pure Perl — no AI agent dependency, no LLM, no network. It runs on any machine with Perl and the CPAN modules installed.
Output
A structured conformity report, per element and per agent:
✅ ELEMENT steel-beam-01 — COMPLIANT
[qualification] material class: S355 ✓
[domain] span/depth ratio: 18.2 ≤ 20 ✓
[fire] REI 60 achieved ✓
❌ ELEMENT timber-post-03 — NON_COMPLIANT
[qualification] moisture content: 22% > 18% max (EC5 §3.3)
[domain] vapour barrier: MISSING
Next step
# Re-run with a different project (no regeneration):
perl run.pl other-project.json
# Run all projet-*.json files at once:
chorus-check <sandbox-name> --all
# Update the corpus and regenerate:
chorus-feed <sandbox-name> new-addendum.txt --enrich
chorus-check <sandbox-name> project.json
--all synthesis table
When --all is used, chorus-check outputs a synthesis table instead of
individual verbatim reports:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
chorus-check --all <sandbox-name>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Project file │ Status │ OK │ KO │ Unproc │ Disc
projet-rules-iso │ SOLVED ✅ │ N │ N │ 0 │ 0
projet-edges │ SOLVED ✅ │ N │ N │ 0 │ 0
projet-cross │ SOLVED ✅ │ N │ N │ 0 │ 0
projet-scale │ SOLVED ✅ │ N │ N │ 0 │ 0
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Overall: CONVERGED ✅ Discordances: 0 / N_total
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
If discordances are found → run chorus-strengthen <sandbox-name> to identify
the gaps and get an enrichment roadmap.
chorus-create-project — Generate a project JSON from the KB
chorus-create-project <sandbox-name> <output-file.json> [--batch]
Single responsibility: read the sandbox KB and generate a valid project JSON file populated with both conforming and non-conforming elements that explore the variety of the domain.
--batch generates the full four-file coverage suite at once (see below)
instead of a single project file.
This is useful for:
- Testing the pipeline end-to-end before a real project is available
- Demonstrating the range of checks the engine performs
- Bootstrapping a project template for an engineer to fill in
What the AI agent reads
agent/agents/index.org— Frame types, pipeline, namespaceagent/agents/<slug>.org— mandatory slots, thresholds, valid value domains- Any existing
project-*.jsonin the sandbox — reference format
⚠️ chorus-create-project never reads Helpers.pm, Feed.pm, or any
generated Perl file. The org KB files are always the canonical source.
Output (single mode)
A JSON file with:
- A representative set of project elements (one per Frame type, with variations)
- Explicit conforming cases (all thresholds met)
- Explicit non-conforming cases (one rule violation per failing element)
- Comments indicating which rule each failing case is designed to trigger
Coverage suite (--batch mode)
--batch produces four project files targeting different testing angles:
| File | Goal |
|---|---|
| projet-rules-iso.json | Test each rule in isolation (1 OK + 1 KO per rule) |
| projet-edges.json | Stress boundary values (value = threshold and threshold ± ε) |
| projet-cross.json | Expose inter-rule interactions (elements triggering multiple rules) |
| projet-scale.json | Volume stress test for _MAX_CYCLES calibration (≥ 100 elements) |
IDs are stable across regenerations (I-, E-, X-, S- prefixes) to allow
diff-style comparison across chorus-check --all runs.
Next step
# Single mode:
chorus-check <sandbox-name> <output-file.json>
# Batch mode — run the full suite:
chorus-check <sandbox-name> --all
# If the suite reveals gaps:
chorus-strengthen <sandbox-name>
chorus-strengthen — Identify rule gaps and recommend enrichment
chorus-strengthen <sandbox-name>
Single responsibility: run the full project suite, classify every
discordance and unprocessed element into a gap type, produce a structured
gap report, and recommend the enrichment corpus to pass to
chorus-feed --enrich.
chorus-strengthen never modifies any KB, YAML, or Perl file — it only reads
and reports.
Prerequisites
chorus-checkhas been run at least once (infrastructure present)- At least one
projet-*.jsonfile exists in$SANDBOX/(ideally the four-file suite fromchorus-create-project --batch)
Gap classification
Every discordant or unprocessed element is classified into one of three types:
| Gap type | Pattern | Root cause |
|---|---|---|
| Rule too strict | Expected CONFORME → got NON_CONFORME | Threshold wrong, CONDITION too narrow, or edge case not covered |
| Rule too permissive | Expected NON_CONFORME → got CONFORME | Missing rule, threshold too high, or CONDITION excludes this type |
| Feed gap | Element is (unprocessed) | Targeting slot not set by Feed for this element type |
Output
A structured gap report per element (id, type, expected, got, rule fired, hypothesis, corpus reference, suggested fix) followed by an enrichment roadmap:
- Bucket A — corpus clarification needed (normative source ambiguous)
- Bucket B — direct YAML adjustment (no
chorus-feedneeded) - Bucket C — missing coverage → draft
corpus-correctif.txtforchorus-feed <sandbox-name> corpus-correctif.txt --enrich
Reinforcement loop
chorus-create-project <sb> --batch ← build the coverage suite (once)
↓
chorus-strengthen <sb> ← identify gaps
↓
[edit YAML directly] ← bucket B fixes
chorus-feed <sb> corpus-fix.txt --enrich ← bucket C new rules
↓
chorus-check <sb> --all ← verify
↓
chorus-strengthen <sb> ← check convergence
↓
✅ CONVERGED — all projects pass, 0 discordances
chorus-import-project — Align engineer documents with the KB
chorus-import-project <sandbox-name> <source…> [--out <file.json>] [--batch]
Single responsibility: read a project document produced by an engineer (PDF, Word, Excel, plain text, table pasted inline) and align its terminology with the sandbox KB slots and types, producing a valid project JSON file.
This bridges the gap between how engineers describe a project (free terminology, domain-specific jargon, informal tables) and the exact slot names and value domains the Chorus pipeline expects.
Three invocation modes
| Syntax | Mode | Output |
|---|---|---|
| chorus-import-project sb file.pdf | Single | 1 JSON |
| chorus-import-project sb f1.pdf f2.xlsx f3.docx | Merge | 1 merged JSON (same project, complementary files) |
| chorus-import-project sb ./dossier/ or --batch | Batch | 1 JSON per file + summary report |
Mode is detected automatically from the number and type of source arguments.
What the AI agent reads
agent/agents/index.org— Frame types, pipeline, namespaceagent/agents/<slug>.org— slot names, value domains, mandatory/optionalagent/thesaurus.org(if present) — validated project terminology from previous imports (highest priority)- Previous
agent/import-report-*.org— past alignment decisions (secondary — skipped if covered by thesaurus)
What the AI agent produces
project-import-<NNN>.json— the aligned project JSONagent/import-report-<NNN>.org— alignment report: term mappings, gaps, ambiguitiesagent/thesaurus.org— updated incrementally after each alignment decision; created on first import if absent
Gaps (values absent from the source document) are reported but never invented.
Next step
# Review the import report before running:
agent/import-report-<NNN>.org
# Then validate:
chorus-check <sandbox-name> project-import-<NNN>.json
Complete workflow — end to end
Starting from a PDF corpus
# 1. Extract the corpus (--auto recommended for technical standards)
chorus-pdf my-sandbox corpus/standard.pdf --auto
# → corpus/001-standard-vision.md
# 2. Build the knowledge base
chorus-feed my-sandbox corpus/001-standard-vision.md
# → agent/agents/*.org, rules/**/*.yml, lib/.../Helpers.pm
# ← domain expert reviews and corrects agent/agents/*.org
# 3. Generate infrastructure and run
chorus-check my-sandbox project.json
# → Feed.pm, Agent/*.pm, Expert.pm, run.pl
# → conformity report
Starting from an engineer document
# Generate or import a project file
chorus-create-project my-sandbox --batch # generate from KB
chorus-import-project my-sandbox engineer-notes.pdf # align from document
# Validate
chorus-check my-sandbox --all
Validating and strengthening the rule base
# Generate the coverage suite
chorus-create-project my-sandbox --batch
# → projet-rules-iso.json, projet-edges.json, projet-cross.json, projet-scale.json
# Run all projects in one pass
chorus-check my-sandbox --all
# → synthesis table with CONFORME / NON_CONFORME / unprocessed / discordances
# If discordances found → identify gaps and get enrichment roadmap
chorus-strengthen my-sandbox
# → gap report + corpus-correctif.txt recommendation
# Apply fixes and re-run
chorus-feed my-sandbox corpus-correctif.txt --enrich
chorus-check my-sandbox --all
# → all projects CONVERGED ✅
Updating when the standard changes
chorus-feed my-sandbox new-addendum.txt --enrich
chorus-check my-sandbox project.json # regenerates only what changed
What runs without an AI agent
Once chorus-check has generated the infrastructure, execution is fully
autonomous — no AI agent, no LLM, no network:
# On any machine with Perl and the required CPAN modules:
perl run.pl project.json
# Re-run with a different project (no regeneration):
perl run.pl other-project.json
Adapting to a new project requires an AI agent. A project JSON can be written by
hand in principle, but chorus-create-project and chorus-import-project are
the practical path: they read the KB and handle the gap between engineer
terminology and the exact slot names and value domains the pipeline expects.
An AI agent is also needed when the normative corpus changes (chorus-feed --enrich
followed by chorus-check).
Technical prerequisites
Perl (runtime)
cpanm Chorus::Engine # inference engine
cpanm YAML # YAML rule loading
Python (corpus extraction — chorus-pdf only)
pip install pdfminer.six pypdf # text and page classification
sudo apt install poppler-utils # pdftoppm (--auto and --images modes)
export ANTHROPIC_API_KEY="sk-ant-..." # LLM vision (--auto and --images)
Explore the sandbox without an AI agent
The sandboxes/demo_en sandbox contains the complete output of the chain —
corpus, org KB, YAML rules, Perl infrastructure. Running
perl sandboxes/demo_en/run.pl sandboxes/demo_en/project-01.json shows the result live
using the pre-built project JSON included in the sandbox. To adapt to a new
project, an AI agent is required.
Quick reference
| Command | Input | Output | Prerequisites |
|---|---|---|---|
| chorus-pdf | PDF file | corpus/<NNN>-<slug>-text.txt or -vision.md | pdfminer.six; API key for --hybrid/--auto/--images |
| chorus-word | .docx file | corpus/<NNN>-<slug>-vision.md or -text.txt | python-docx; API key for hybrid mode |
| chorus-excel | .xlsx or .csv file | corpus/<NNN>-<slug>-vision.md or -text.txt | openpyxl; API key for hybrid mode |
| chorus-feed | .txt or .md corpus | agent/agents/*.org, YAML rules, Helpers.pm | — |
| chorus-check | project JSON (or --all) | Feed.pm, Agent/*.pm, Expert.pm, run.pl + report | chorus-feed run first |
| chorus-create-project | (KB only) | project JSON or 4-file coverage suite (--batch) | chorus-feed run first |
| chorus-import-project | engineer document | aligned project JSON + import report | chorus-feed run first |
| chorus-strengthen | (project suite) | gap report + enrichment roadmap | chorus-check run first |
Further reading
01-intro.md— Chorus concepts, Frame model, inference engine, YAML DSL02-ai-agent.md— LLM vs Chorus positioning, why the chain works03-applications.md— domain-by-domain analysis, onboarding timesagent/skills/chorus-pdf.md— full skill reference forchorus-pdfagent/skills/chorus-word.md— full skill reference forchorus-wordagent/skills/chorus-excel.md— full skill reference forchorus-excelagent/skills/chorus-feed.md— full skill reference forchorus-feedagent/skills/chorus-check.md— full skill reference forchorus-checkagent/skills/chorus-create-project.md— full skill reference forchorus-create-projectagent/skills/chorus-import-project.md— full skill reference forchorus-import-projectagent/skills/chorus-strengthen.md— full skill reference forchorus-strengthen