The chorus-* Commands — AI-Assisted Workflow Reference

The nine chorus-* commands form a complete pipeline for turning a normative corpus (PDF, plain text, Word, Excel) into a running Perl inference engine that validates real projects.

They are AI agent commands — not Perl modules or shell scripts. Each is a skill loaded by an AI agent (Claude, Copilot, ECA…) and executed interactively in the development environment.

The AI agent is not an execution dependency. The Perl pipeline generated by the chain runs entirely on its own, on any machine with Perl installed, without an AI agent and without a network connection.

The AI agent is a project dependency. To adapt a sandbox to a new project — aligning engineer documents with KB slots and producing a valid project JSON — you need chorus-create-project or chorus-import-project, both AI agent skills. The LLM reads the KB and bridges the terminology gap that no static script can cover generically. An AI agent is also needed when the normative corpus changes.

The complete pipeline at a glance

                      ┌─────────────────────────────────┐
                      │  Normative corpus (PDF, text…)  │
                      └──────────────┬──────────────────┘
                                     │
                          chorus-pdf    (if PDF)
                          chorus-word   (if .docx)
                          chorus-excel  (if .xlsx / .csv)
                                     │
                                     ▼
                      ┌─────────────────────────────────┐
                      │  corpus/<NNN>-<slug>-text.txt   │
                      │  corpus/<NNN>-<slug>-vision.md  │
                      └──────────────┬──────────────────┘
                                     │
                          chorus-feed
                                     │
                                     ▼
                      ┌─────────────────────────────────┐
                      │  agent/agents/<slug>.org  (KB)  │
                      │  rules/<slug>/R<NN>-xxx.yml     │
                      │  lib/…/Agent/<Slug>/Helpers.pm  │
                      └──────────────┬──────────────────┘
                                     │  ← domain expert reviews, corrects
                                     │
                          chorus-check
                                     │
                                     ▼
                      ┌─────────────────────────────────┐
                      │  Feed.pm · Agent/*.pm           │
                      │  Expert.pm · run.pl             │
                      └──────────────┬──────────────────┘
                                     │
                         perl run.pl project.json
                                     │
                                     ▼
                      ✅ COMPLIANT / ❌ NON_COMPLIANT
                         with reason, per element, per agent
                                     │
                          chorus-strengthen
                                     │
                                     ▼
                      ┌─────────────────────────────────┐
                      │  gap report + enrichment roadmap│
                      └──────────────┬──────────────────┘
                                     │
                   chorus-feed --enrich  (targeted fixes)
                                     └──────────────────┐
                                                        │ reinforcement loop
                                               chorus-check --all ✅

The project file can be written by hand, generated from the KB with chorus-create-project, or aligned from engineer documents with chorus-import-project. Once a project file exists, chorus-strengthen can identify gaps in the YAML rules and recommend enrichment corpora.

chorus-quickstart — Pipeline overview

chorus-quickstart

Single responsibility: display the complete pipeline from a raw corpus to a compliance report, with the two available paths and their decision fork.

This command does not execute anything — it is a guided reference showing:

Start here if you are new to Chorus or unsure which path to follow.

chorus-pdf — Extract a PDF corpus

chorus-pdf <sandbox-name> <file.pdf> [--out <slug>] [--hybrid] [--auto] [--images] [--batch]

Single responsibility: produce an enriched text file from a PDF. Standard PDF-to-text tools silently drop normative tables rendered as images, multi-column layouts, and figure annotations. chorus-pdf recovers them.

Extraction modes

| Mode | Flag | Engine | API key | Output | |---|---|---|---|---| | Hybrid (default) | (none — auto-detected) | pdfminer text on ALL pages + Claude vision on cropped figures | ✅ ANTHROPIC_API_KEY | <slug>-vision.md | | Text (fallback) | (none — no API key) | pdfminer.six only | ❌ not required | <slug>-text.txt | | Auto | --auto | pdfminer (text pages) + LLM vision (figure pages) | ✅ | <slug>-vision.md | | Images | --images | pdftoppm 150 DPI + LLM vision on all pages | ✅ | <slug>-vision.md |

Choosing a mode:

No flag provided
  → Phase 0.0 auto-detects ANTHROPIC_API_KEY
  → if key valid   : --hybrid activated automatically  ← DEFAULT
  → if key absent or invalid : text mode (fallback)

API key available, mixed document (text + embedded figures)
  → (default — hybrid activated automatically)

API key available, text-dominant document (few or no embedded figures)
  → --auto  ← faster, fewer API calls

API key available, mostly diagrams or scanned PDF
  → --images

No API key available
  → (default text mode — forced fallback)

--auto classifies each page first (pdfminer on text-only pages, vision on pages with figures), minimising API calls to pages that actually need them.

Output

corpus/<NNN>-<slug>-text.txt or corpus/<NNN>-<slug>-vision.md (numbered in sequence with existing corpus files)

Prerequisites

pip install pdfminer.six pypdf
sudo apt install poppler-utils          # for --auto and --images
export ANTHROPIC_API_KEY="sk-ant-..."   # for --auto and --images

Next step

chorus-feed <sandbox-name> corpus/<NNN>-<slug>-text.txt
            (or: corpus/<NNN>-<slug>-vision.md)

chorus-word — Extract a Word document

chorus-word <sandbox-name> <file.docx> [--out <slug>] [--batch]

Single responsibility: produce an enriched text file from a Word document (.docx). Standard Word-to-text converters silently drop embedded images, merged cells, and the actual reading order. chorus-word preserves them.

Extraction modes

| Mode | Engine | API key | Images | Tables | Output | |---|---|---|---|---|---| | Hybrid (default) | python-docx text + Claude vision on images | ✅ ANTHROPIC_API_KEY | ✅ described | ✅ Markdown pipe | <slug>-vision.md | | Text (fallback) | python-docx only | ❌ not required | [IMAGE — not extracted] placeholder | ✅ Markdown pipe | <slug>-text.txt |

Mode is auto-detected: hybrid if the API key is present and valid, text otherwise.

Prerequisites

pip install python-docx
export ANTHROPIC_API_KEY="sk-ant-..."   # for hybrid mode

Next step

chorus-feed <sandbox-name> corpus/<NNN>-<slug>-vision.md
            (or: corpus/<NNN>-<slug>-text.txt)

chorus-excel — Extract an Excel spreadsheet or CSV

chorus-excel <sandbox-name> <file.xlsx|file.csv> [--out <slug>] [--sheet <name>] [--batch]

Single responsibility: produce an enriched text file from an Excel spreadsheet (.xlsx) or CSV file. Naive conversions flatten merged cells, ignore embedded images, and do not describe charts. chorus-excel recovers them.

Extraction modes

| Mode | Format | Engine | API key | Images / Charts | Output | |---|---|---|---|---|---| | Hybrid (default) | .xlsx | openpyxl + Claude vision | ✅ ANTHROPIC_API_KEY | ✅ described | <slug>-vision.md | | Text (fallback) | .xlsx | openpyxl only | ❌ not required | [IMAGE/CHART — not extracted] | <slug>-text.txt | | CSV | .csv | csv.reader | ❌ | N/A | <slug>-text.txt |

Mode is auto-detected from the file extension and API key availability.

Prerequisites

pip install openpyxl
sudo apt install libreoffice   # for charts in hybrid mode
export ANTHROPIC_API_KEY="sk-ant-..."   # for hybrid mode

Next step

chorus-feed <sandbox-name> corpus/<NNN>-<slug>-vision.md
            (or: corpus/<NNN>-<slug>-text.txt)

chorus-feed — Build the knowledge base

chorus-feed <sandbox-name> <corpus> [--enrich]

Single responsibility: extract knowledge from a corpus and write it into structured KB files. Does not generate any Perl infrastructure.

<corpus> must be a plain-text (.txt) or Markdown (.md) file — never a PDF. If a PDF is provided, chorus-feed stops and suggests running chorus-pdf first.

Two modes

Mode A — Initialization (default, no flag)

Used for a new sandbox or a fresh start. Creates the full sandbox structure:

<sandbox-name>/
  corpus/001-<slug>.txt        ← the corpus
  agent/agents/<slug>.org      ← KB per agent (ontology, slots, rules, helpers)
  agent/agents/index.org       ← pipeline index
  rules/<slug>/R<NN>-xxx.yml   ← YAML inference rules
  lib/…/Agent/<Slug>/Helpers.pm ← normative tables (extracted from corpus)
  README.org

What the AI agent produces per agent:

Mode B — Incremental enrichment (--enrich required)

Used when the sandbox already has a KB and new normative material has arrived. The AI agent reads the existing KB, classifies each new rule as refinement, extension, or new domain, and applies targeted changes.

chorus-feed <sandbox-name> new-addendum.txt --enrich

What chorus-feed does NOT do

It never generates Feed.pm, Agent/*.pm, Expert.pm, or run.pl. Those are the responsibility of chorus-check.

Key design decisions embedded in the KB

Next step

chorus-check <sandbox-name> project.json

Or, to review what was generated before running:

# Open the KB in your editor
agent/agents/<slug>.org

chorus-check — Generate infrastructure and run

chorus-check <sandbox-name> <project-file.json> [--all]

Single responsibility: read the KB, generate the Perl infrastructure, run the pipeline against the project file, and produce a conformity report.

--all runs every projet-*.json file found in the sandbox in one pass and produces a synthesis table (see below). The fast path applies: the infrastructure is checked once and reused for every project file.

Smart regeneration

chorus-check keeps a hash of the KB files (agent/.kb-hash). On each call:

This means running chorus-check twice on the same sandbox with different project files costs almost nothing on the second call.

What gets generated

| File | Role | |---|---| | lib/<NS>/Feed.pm | Loads the project JSON, creates Frames, sets targeting slots | | lib/<NS>/Agent/<Slug>.pm | Shell for each agent: imports Helpers, loads YAML rules | | lib/<NS>/Expert.pm | Wires all agents, sets _MAX_CYCLES, registers with Expert | | run.pl | Entry point: perl run.pl project.json |

The generated code is pure Perl — no AI agent dependency, no LLM, no network. It runs on any machine with Perl and the CPAN modules installed.

Output

A structured conformity report, per element and per agent:

✅ ELEMENT steel-beam-01 — COMPLIANT
   [qualification] material class: S355 ✓
   [domain]        span/depth ratio: 18.2 ≤ 20 ✓
   [fire]          REI 60 achieved ✓

❌ ELEMENT timber-post-03 — NON_COMPLIANT
   [qualification] moisture content: 22% > 18% max (EC5 §3.3)
   [domain]        vapour barrier: MISSING

Next step

# Re-run with a different project (no regeneration):
perl run.pl other-project.json

# Run all projet-*.json files at once:
chorus-check <sandbox-name> --all

# Update the corpus and regenerate:
chorus-feed <sandbox-name> new-addendum.txt --enrich
chorus-check <sandbox-name> project.json

--all synthesis table

When --all is used, chorus-check outputs a synthesis table instead of individual verbatim reports:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  chorus-check --all  <sandbox-name>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Project file       │ Status    │ OK │ KO │ Unproc │ Disc
  projet-rules-iso   │ SOLVED ✅ │  N │  N │   0    │  0
  projet-edges       │ SOLVED ✅ │  N │  N │   0    │  0
  projet-cross       │ SOLVED ✅ │  N │  N │   0    │  0
  projet-scale       │ SOLVED ✅ │  N │  N │   0    │  0
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Overall: CONVERGED ✅   Discordances: 0 / N_total
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

If discordances are found → run chorus-strengthen <sandbox-name> to identify the gaps and get an enrichment roadmap.

chorus-create-project — Generate a project JSON from the KB

chorus-create-project <sandbox-name> <output-file.json> [--batch]

Single responsibility: read the sandbox KB and generate a valid project JSON file populated with both conforming and non-conforming elements that explore the variety of the domain.

--batch generates the full four-file coverage suite at once (see below) instead of a single project file.

This is useful for:

What the AI agent reads

  1. agent/agents/index.org — Frame types, pipeline, namespace
  2. agent/agents/<slug>.org — mandatory slots, thresholds, valid value domains
  3. Any existing project-*.json in the sandbox — reference format

⚠️ chorus-create-project never reads Helpers.pm, Feed.pm, or any generated Perl file. The org KB files are always the canonical source.

Output (single mode)

A JSON file with:

Coverage suite (--batch mode)

--batch produces four project files targeting different testing angles:

| File | Goal | |---|---| | projet-rules-iso.json | Test each rule in isolation (1 OK + 1 KO per rule) | | projet-edges.json | Stress boundary values (value = threshold and threshold ± ε) | | projet-cross.json | Expose inter-rule interactions (elements triggering multiple rules) | | projet-scale.json | Volume stress test for _MAX_CYCLES calibration (≥ 100 elements) |

IDs are stable across regenerations (I-, E-, X-, S- prefixes) to allow diff-style comparison across chorus-check --all runs.

Next step

# Single mode:
chorus-check <sandbox-name> <output-file.json>

# Batch mode — run the full suite:
chorus-check <sandbox-name> --all

# If the suite reveals gaps:
chorus-strengthen <sandbox-name>

chorus-strengthen — Identify rule gaps and recommend enrichment

chorus-strengthen <sandbox-name>

Single responsibility: run the full project suite, classify every discordance and unprocessed element into a gap type, produce a structured gap report, and recommend the enrichment corpus to pass to chorus-feed --enrich.

chorus-strengthen never modifies any KB, YAML, or Perl file — it only reads and reports.

Prerequisites

Gap classification

Every discordant or unprocessed element is classified into one of three types:

| Gap type | Pattern | Root cause | |---|---|---| | Rule too strict | Expected CONFORME → got NON_CONFORME | Threshold wrong, CONDITION too narrow, or edge case not covered | | Rule too permissive | Expected NON_CONFORME → got CONFORME | Missing rule, threshold too high, or CONDITION excludes this type | | Feed gap | Element is (unprocessed) | Targeting slot not set by Feed for this element type |

Output

A structured gap report per element (id, type, expected, got, rule fired, hypothesis, corpus reference, suggested fix) followed by an enrichment roadmap:

Reinforcement loop

chorus-create-project <sb> --batch     ← build the coverage suite (once)
        ↓
chorus-strengthen <sb>                 ← identify gaps
        ↓
[edit YAML directly]                   ← bucket B fixes
chorus-feed <sb> corpus-fix.txt --enrich  ← bucket C new rules
        ↓
chorus-check <sb> --all                ← verify
        ↓
chorus-strengthen <sb>                 ← check convergence
        ↓
✅ CONVERGED — all projects pass, 0 discordances

chorus-import-project — Align engineer documents with the KB

chorus-import-project <sandbox-name> <source…> [--out <file.json>] [--batch]

Single responsibility: read a project document produced by an engineer (PDF, Word, Excel, plain text, table pasted inline) and align its terminology with the sandbox KB slots and types, producing a valid project JSON file.

This bridges the gap between how engineers describe a project (free terminology, domain-specific jargon, informal tables) and the exact slot names and value domains the Chorus pipeline expects.

Three invocation modes

| Syntax | Mode | Output | |---|---|---| | chorus-import-project sb file.pdf | Single | 1 JSON | | chorus-import-project sb f1.pdf f2.xlsx f3.docx | Merge | 1 merged JSON (same project, complementary files) | | chorus-import-project sb ./dossier/ or --batch | Batch | 1 JSON per file + summary report |

Mode is detected automatically from the number and type of source arguments.

What the AI agent reads

  1. agent/agents/index.org — Frame types, pipeline, namespace
  2. agent/agents/<slug>.org — slot names, value domains, mandatory/optional
  3. agent/thesaurus.org (if present) — validated project terminology from previous imports (highest priority)
  4. Previous agent/import-report-*.org — past alignment decisions (secondary — skipped if covered by thesaurus)

What the AI agent produces

Gaps (values absent from the source document) are reported but never invented.

Next step

# Review the import report before running:
agent/import-report-<NNN>.org

# Then validate:
chorus-check <sandbox-name> project-import-<NNN>.json

Complete workflow — end to end

Starting from a PDF corpus

# 1. Extract the corpus (--auto recommended for technical standards)
chorus-pdf my-sandbox corpus/standard.pdf --auto
#   → corpus/001-standard-vision.md

# 2. Build the knowledge base
chorus-feed my-sandbox corpus/001-standard-vision.md
#   → agent/agents/*.org, rules/**/*.yml, lib/.../Helpers.pm
#   ← domain expert reviews and corrects agent/agents/*.org

# 3. Generate infrastructure and run
chorus-check my-sandbox project.json
#   → Feed.pm, Agent/*.pm, Expert.pm, run.pl
#   → conformity report

Starting from an engineer document

# Generate or import a project file
chorus-create-project my-sandbox --batch             # generate from KB
chorus-import-project my-sandbox engineer-notes.pdf  # align from document

# Validate
chorus-check my-sandbox --all

Validating and strengthening the rule base

# Generate the coverage suite
chorus-create-project my-sandbox --batch
#   → projet-rules-iso.json, projet-edges.json, projet-cross.json, projet-scale.json

# Run all projects in one pass
chorus-check my-sandbox --all
#   → synthesis table with CONFORME / NON_CONFORME / unprocessed / discordances

# If discordances found → identify gaps and get enrichment roadmap
chorus-strengthen my-sandbox
#   → gap report + corpus-correctif.txt recommendation

# Apply fixes and re-run
chorus-feed my-sandbox corpus-correctif.txt --enrich
chorus-check my-sandbox --all
#   → all projects CONVERGED ✅

Updating when the standard changes

chorus-feed my-sandbox new-addendum.txt --enrich
chorus-check my-sandbox project.json     # regenerates only what changed

What runs without an AI agent

Once chorus-check has generated the infrastructure, execution is fully autonomous — no AI agent, no LLM, no network:

# On any machine with Perl and the required CPAN modules:
perl run.pl project.json

# Re-run with a different project (no regeneration):
perl run.pl other-project.json

Adapting to a new project requires an AI agent. A project JSON can be written by hand in principle, but chorus-create-project and chorus-import-project are the practical path: they read the KB and handle the gap between engineer terminology and the exact slot names and value domains the pipeline expects. An AI agent is also needed when the normative corpus changes (chorus-feed --enrich followed by chorus-check).

Technical prerequisites

Perl (runtime)

cpanm Chorus::Engine    # inference engine
cpanm YAML              # YAML rule loading

Python (corpus extraction — chorus-pdf only)

pip install pdfminer.six pypdf   # text and page classification
sudo apt install poppler-utils   # pdftoppm (--auto and --images modes)
export ANTHROPIC_API_KEY="sk-ant-..."   # LLM vision (--auto and --images)

Explore the sandbox without an AI agent

The sandboxes/demo_en sandbox contains the complete output of the chain — corpus, org KB, YAML rules, Perl infrastructure. Running perl sandboxes/demo_en/run.pl sandboxes/demo_en/project-01.json shows the result live using the pre-built project JSON included in the sandbox. To adapt to a new project, an AI agent is required.

Quick reference

| Command | Input | Output | Prerequisites | |---|---|---|---| | chorus-pdf | PDF file | corpus/<NNN>-<slug>-text.txt or -vision.md | pdfminer.six; API key for --hybrid/--auto/--images | | chorus-word | .docx file | corpus/<NNN>-<slug>-vision.md or -text.txt | python-docx; API key for hybrid mode | | chorus-excel | .xlsx or .csv file | corpus/<NNN>-<slug>-vision.md or -text.txt | openpyxl; API key for hybrid mode | | chorus-feed | .txt or .md corpus | agent/agents/*.org, YAML rules, Helpers.pm | — | | chorus-check | project JSON (or --all) | Feed.pm, Agent/*.pm, Expert.pm, run.pl + report | chorus-feed run first | | chorus-create-project | (KB only) | project JSON or 4-file coverage suite (--batch) | chorus-feed run first | | chorus-import-project | engineer document | aligned project JSON + import report | chorus-feed run first | | chorus-strengthen | (project suite) | gap report + enrichment roadmap | chorus-check run first |

Further reading