Agent Platform: MVP Plan

Internal planning doc. Aronlight Quote-Gen as first agent. Drafted 2026-04-21.

A thin vertical slice that closes Aronlight, plus a data model designed so client #2 (BCP or Sophie) drops in as a second tenant without rewriting the core.

Three risks to stress-test first

Risk 1

Re-inventing LangSmith + n8n + a vector DB

Generic "agent platform" is where teams 10x your size have burned $100M. Scope must be ruthlessly narrow to your own agents, or you build scaffolding nobody uses.

Risk 2

Abstraction on sample size 1

Quote-Gen alone is not enough to design a platform. Map at least two client agents side-by-side (Quote-Gen + RMA, or + Sophie screening) before choosing what to share.

Risk 3

Python for talent is a hypothesis

Your stack is TS/Next/Supabase. Splitting Python backend + TS frontend doubles devex unless you hire now. Claude Code writes both equally fast. Use Python only where ML/data people will actually plug in.

The 4 primitives (the spine)

Every client agent you've described, Quote-Gen, RMA, Sophie screening, BCP assessment, reduces to the same four blocks. Build the MVP as these four, not as a generic flow editor.

Source
Extractor
Grounder
Reviewer
1

Source

Where raw data comes in. MVP: drop a folder of files (emails, PDFs, XLSX).

Later: Gmail, Outlook, Odoo ERP, SharePoint connectors
2

Extractor

Claude parses raw into a structured table with per-field confidence 0 to 100 and a reason.

Output: field | value | confidence | why | source file
3

Grounder

Client-specific knowledge plugged into the prompt. MVP: uploaded Excel + markdown.

Later: live SQL, Odoo queries, vector search over internal docs
4

Reviewer

Human UI to approve, edit, or reject each row. Run-level observability (% approved, avg confidence, time to decision).

Every agent = a config over these four. YAML or a form, no DAG editor yet.

Quick Win vs Moonshot vs Safe Bet

Quick Win

Aronlight-only demo

Single page. Folder in, table with confidence, Manel approves, CSV out. Closes the Aronlight call in 1 week. No platform, no second tenant.

Safe Bet

4-week MVP on the 4 primitives

Single-tenant to start (Aronlight), data model + APIs ready so client #2 drops in as a second tenant in week 5. No visual builder yet, YAML per agent. Closes Aronlight AND becomes your reference architecture.

Moonshot

Self-serve studio

Any client logs in, uploads data, wires agents visually, observes runs. Monetized per transaction. 3 to 6 months. Don't start here.

Stack decision

Frontend
Next.js + Tailwind + Shadcn
Your existing muscle. Ships fast. Linear aesthetic by default.
Agent runtime + integrations
FastAPI (Python)
Where ML/data hires will plug in later. Narrow surface, well defined API contracts.
Auth, DB, storage
Supabase
Same as Sophie. Row-level security for tenant isolation from day 1.
LLM
Claude (never pinned)
Global rule. Always latest model family.
Config per agent
YAML checked into repo
Readable by humans, diffable in git, no visual editor yet.
Deploy
Vercel (FE) + Fly.io or Railway (API)
Vercel for the Next.js app, Python API on a managed host until volume justifies infra.

Data model (Supabase)

TablePurposeKey fields
agents One row per configured agent per client id, client_id, name, config_yaml, created_at
runs A batch execution of an agent id, agent_id, status, started_at, finished_at, stats_json
sources Input files uploaded for a run id, run_id, type, file_path, mime_type
extractions Per-field outputs from the Extractor id, run_id, source_id, field, value, confidence, reason
grounding_docs Knowledge base files per agent id, agent_id, type, file_path, parsed_content
reviews Human approvals, edits, rejects id, extraction_id, status, edited_value, reviewer_id, reviewed_at

4-week MVP roadmap

Week 1

Foundation + Source

  • Repo scaffolding, auth, org isolation
  • Create Agent page (name, description, YAML)
  • Folder upload to Supabase Storage
  • FastAPI /runs/start endpoint stub
  • Shell UI: agents list, run detail
Claude Code: all of this.
Human: nothing yet.
Week 2

Extractor + Confidence

  • Extractor service: per file Claude call with schema
  • Per-field confidence + reason
  • Table view: sort/filter by confidence
  • Re-run from UI
Claude Code: service + UI.
Human: prompt tuning on 10 to 20 real Aronlight emails.
Week 3

Grounder + Reviewer

  • Upload Excel/MD as grounding, parse to context
  • Extractor prompt reads grounding
  • Review UI: approve, edit, reject
  • Export: CSV or webhook to Odoo
Claude Code: parsers, UI, export.
Human: review the flow with Manel.
Week 4

Observability + 2nd agent

  • Run dashboard: % approved, avg confidence, time per field
  • Scaffold agent #2 (recommend: Sophie CV screening)
  • Test if primitives hold
  • If yes: ship to Manel + Ronald as design partners
Claude Code: dashboard + scaffold.
Human: decide agent #2.

Visual direction (from your reference: Faction)

The reference platform is Faction. Screenshots show: (1) email with inline entity highlight + dark tooltip callout, (2) fully parsed quote with ALL-CAPS micro-labels and resolved line items, (3) product data table with ---- placeholders and an "Enrich Data with Faction" action. This IS the 4-primitive flow made visible. We copy the aesthetic.

Design language to adopt

  • Warm off-white background, not cool gray (#F7F5F2 ish)
  • Thin typography, very heavy whitespace
  • No shadows, minimal borders, flat surfaces
  • ALL CAPS micro-labels with tight tracking for every field
  • Universal ---- token for missing fields
  • Dark rounded tooltips with white text for actions and entity callouts
  • Blueprint grid framing at page edges
  • Monochrome with at most one subtle accent

Primitive mapping

Faction screenOur primitive
Email w/ entity highlight + calloutSource (raw in, entities tagged)
Parsed quote w/ line items + pricingExtractor (structured output)
Product table w/ ---- + Enrich actionGrounder (knowledge fills gaps)
(not shown) approve/edit rowReviewer (human in the loop)

Faction aesthetic demo (live)

This strip below is rendered in the same aesthetic we'd ship in Week 2 (Extractor output table). All-caps labels, thin type, off-white surface, monospace values, ---- for missing.

Customer Name
Meridian Build Supply Co.
POC
Alex Smith
Address
1820 Westfield Road, Suite 4B, Brookhaven, MA 02417
Email
alex@meridianbuild.co
Date
April 21st 2026
Raw InputDescriptionMPNUnit PriceQtyExt PriceConfidence
1/2" PVC 6" Threaded Nipples1/2" PVC Nipple, 6" length, threaded for pipe extensions.PVN-60501.8940$75.6098
1/2" PVC Ball Valves1/2" PVC Ball Valve, full port for main water shutoff.SPE-370812.4728$349.1696
2-1/2 to 1/2" NPT Taps2-1/2" to 1/2" NPT reducing tap for adapting large to small lines.TAP-3583228.153$84.4572
8" PVC 40 SW Caps8" PVC Schedule 40 socket cap for terminating main drain lines.--------25----34
4" PVC 40 SW Caps4" PVC Schedule 40 socket cap for capping branch drains.CAP-400385.8515$87.7591
Estimated Shipping: 3 to 5 business days
Avg Confidence: 78    Total: $596.96
Enrich Data with Grounder →

Week 1 wireframe: Agents list + Create Agent

+---------------------------------------------------------------+
|  BLUEPRINT / FACTION-STYLE OFF-WHITE PAGE WITH GRID EDGES     |
|                                                               |
|   . . .                                                       |
|                                                               |
|   AGENTS                                          [+ New]     |
|   ------------------------------------------------------      |
|   NAME                    CLIENT         LAST RUN    RUNS     |
|   Quote-Gen               Aronlight      2h ago      17       |
|   RMA                     Aronlight      ----        0        |
|   CV Screening            Sophie         ----        0        |
|                                                               |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|   . . .                                                       |
|                                                               |
|   NEW AGENT                                                   |
|                                                               |
|   NAME          [ Quote-Gen                           ]       |
|   CLIENT        [ Aronlight                  v        ]       |
|   DESCRIPTION   [ Parse inbound RFQ emails into a     ]       |
|                 [ structured quote with confidence    ]       |
|                                                               |
|   SOURCE        ( ) Folder upload    ( ) Gmail (soon)         |
|                                                               |
|   OUTPUT SCHEMA (YAML)                                        |
|   +-------------------------------------------------------+   |
|   | fields:                                               |   |
|   |   - name: raw_input                                   |   |
|   |     type: string                                      |   |
|   |   - name: description                                 |   |
|   |     type: string                                      |   |
|   |   - name: mpn                                         |   |
|   |     type: string                                      |   |
|   |   - name: unit_price                                  |   |
|   |     type: number                                      |   |
|   |   - name: qty                                         |   |
|   |     type: integer                                     |   |
|   +-------------------------------------------------------+   |
|                                                               |
|   GROUNDING    [ + Upload Excel or Markdown ]                 |
|                                                               |
|                                    [ Cancel ]  [ Create ]     |
|                                                               |
+---------------------------------------------------------------+
  

Reference products to benchmark

Humanloop

Prompt ops + evaluation. Useful for how they expose confidence and run logs.

LangSmith

Agent observability standard. Look at their run detail view and trace tree.

Retool AI

Visual builder. Don't copy the DAG, copy how they surface errors to non-technical users.

Handle.ai

a16z seeded vertical agent for insurance. Closest comp to what you're doing for manufacturing. Study their "approve/edit" loop.

Open questions

1. Reference platform. LOCKED: Faction. Aesthetic captured in the Visual Direction section above.
2. Who builds. Claude Code solo? You + Claude Code? Hire a Python dev now or in week 4? This shapes scope.
3. Agent #2 for week 4. Sophie screening (easy, no ERP), BCP assessment (paid context), or Aronlight RMA (same client, harder)?
4. Aronlight Loom blocker. Manel's walkthroughs are still pending. Build on synthetic data now and swap in real later, or wait?

Proposed next step

Start Week 1

Reference is locked (Faction). Next decisions: (a) confirm 4-primitives scope so I can write tasks/todo.md for Week 1. (b) Decide if the Python API lives in this repo or a sibling repo (e.g. agent-platform/). (c) Pick agent #2 for Week 4: Sophie CV screening, BCP assessment, or Aronlight RMA.

Recommendation: new sibling repo agent-platform/ since this will outlive Aronlight; lock 4 primitives as scope; agent #2 = Sophie CV screening (no ERP dependency, you have volume).