Agent Platform: MVP Plan

Internal planning doc. Aronlight Quote-Gen as first agent. Drafted 2026-04-21.

A thin vertical slice that closes Aronlight, plus a data model designed so client #2 (BCP or Sophie) drops in as a second tenant without rewriting the core.

Three risks to stress-test first

Risk 1

Re-inventing LangSmith + n8n + a vector DB

Generic "agent platform" is where teams 10x your size have burned $100M. Scope must be ruthlessly narrow to your own agents, or you build scaffolding nobody uses.

Risk 2

Abstraction on sample size 1

Quote-Gen alone is not enough to design a platform. Map at least two client agents side-by-side (Quote-Gen + RMA, or + Sophie screening) before choosing what to share.

Risk 3

Python for talent is a hypothesis

Your stack is TS/Next/Supabase. Splitting Python backend + TS frontend doubles devex unless you hire now. Claude Code writes both equally fast. Use Python only where ML/data people will actually plug in.

The 4 primitives (the spine)

Every client agent you've described, Quote-Gen, RMA, Sophie screening, BCP assessment, reduces to the same four blocks. Build the MVP as these four, not as a generic flow editor.

Source

→

Extractor

→

Grounder

→

Reviewer

Source

Where raw data comes in. MVP: drop a folder of files (emails, PDFs, XLSX).

Later: Gmail, Outlook, Odoo ERP, SharePoint connectors

Extractor

Claude parses raw into a structured table with per-field confidence 0 to 100 and a reason.

Output: field | value | confidence | why | source file

Grounder

Client-specific knowledge plugged into the prompt. MVP: uploaded Excel + markdown.

Later: live SQL, Odoo queries, vector search over internal docs

Reviewer

Human UI to approve, edit, or reject each row. Run-level observability (% approved, avg confidence, time to decision).

Every agent = a config over these four. YAML or a form, no DAG editor yet.

Quick Win vs Moonshot vs Safe Bet

Quick Win

Aronlight-only demo

Single page. Folder in, table with confidence, Manel approves, CSV out. Closes the Aronlight call in 1 week. No platform, no second tenant.

Safe Bet

4-week MVP on the 4 primitives

Single-tenant to start (Aronlight), data model + APIs ready so client #2 drops in as a second tenant in week 5. No visual builder yet, YAML per agent. Closes Aronlight AND becomes your reference architecture.

Moonshot

Self-serve studio

Any client logs in, uploads data, wires agents visually, observes runs. Monetized per transaction. 3 to 6 months. Don't start here.

Stack decision

Frontend

Next.js + Tailwind + Shadcn

Your existing muscle. Ships fast. Linear aesthetic by default.

Agent runtime + integrations

FastAPI (Python)

Where ML/data hires will plug in later. Narrow surface, well defined API contracts.

Auth, DB, storage

Supabase

Same as Sophie. Row-level security for tenant isolation from day 1.

LLM

Claude (never pinned)

Global rule. Always latest model family.

Config per agent

YAML checked into repo

Readable by humans, diffable in git, no visual editor yet.

Deploy

Vercel (FE) + Fly.io or Railway (API)

Vercel for the Next.js app, Python API on a managed host until volume justifies infra.

Data model (Supabase)

Table	Purpose	Key fields
`agents`	One row per configured agent per client	id, client_id, name, config_yaml, created_at
`runs`	A batch execution of an agent	id, agent_id, status, started_at, finished_at, stats_json
`sources`	Input files uploaded for a run	id, run_id, type, file_path, mime_type
`extractions`	Per-field outputs from the Extractor	id, run_id, source_id, field, value, confidence, reason
`grounding_docs`	Knowledge base files per agent	id, agent_id, type, file_path, parsed_content
`reviews`	Human approvals, edits, rejects	id, extraction_id, status, edited_value, reviewer_id, reviewed_at

4-week MVP roadmap

Week 1

Foundation + Source

Repo scaffolding, auth, org isolation
Create Agent page (name, description, YAML)
Folder upload to Supabase Storage
FastAPI /runs/start endpoint stub
Shell UI: agents list, run detail

Claude Code: all of this.
Human: nothing yet.

Week 2

Extractor + Confidence

Extractor service: per file Claude call with schema
Per-field confidence + reason
Table view: sort/filter by confidence
Re-run from UI

Claude Code: service + UI.
Human: prompt tuning on 10 to 20 real Aronlight emails.

Week 3

Grounder + Reviewer

Upload Excel/MD as grounding, parse to context
Extractor prompt reads grounding
Review UI: approve, edit, reject
Export: CSV or webhook to Odoo

Claude Code: parsers, UI, export.
Human: review the flow with Manel.

Week 4

Observability + 2nd agent

Run dashboard: % approved, avg confidence, time per field
Scaffold agent #2 (recommend: Sophie CV screening)
Test if primitives hold
If yes: ship to Manel + Ronald as design partners

Claude Code: dashboard + scaffold.
Human: decide agent #2.

Visual direction (from your reference: Faction)

The reference platform is Faction. Screenshots show: (1) email with inline entity highlight + dark tooltip callout, (2) fully parsed quote with ALL-CAPS micro-labels and resolved line items, (3) product data table with ---- placeholders and an "Enrich Data with Faction" action. This IS the 4-primitive flow made visible. We copy the aesthetic.

Design language to adopt

Warm off-white background, not cool gray (#F7F5F2 ish)
Thin typography, very heavy whitespace
No shadows, minimal borders, flat surfaces
ALL CAPS micro-labels with tight tracking for every field
Universal ---- token for missing fields
Dark rounded tooltips with white text for actions and entity callouts
Blueprint grid framing at page edges
Monochrome with at most one subtle accent

Primitive mapping

Faction screen	Our primitive
Email w/ entity highlight + callout	Source (raw in, entities tagged)
Parsed quote w/ line items + pricing	Extractor (structured output)
Product table w/ `----` + Enrich action	Grounder (knowledge fills gaps)
(not shown) approve/edit row	Reviewer (human in the loop)

Faction aesthetic demo (live)

This strip below is rendered in the same aesthetic we'd ship in Week 2 (Extractor output table). All-caps labels, thin type, off-white surface, monospace values, ---- for missing.

Customer Name

Meridian Build Supply Co.

POC

Alex Smith

Address

1820 Westfield Road, Suite 4B, Brookhaven, MA 02417

alex@meridianbuild.co

Date

April 21st 2026

Raw Input	Description	MPN	Unit Price	Qty	Ext Price	Confidence
1/2" PVC 6" Threaded Nipples	1/2" PVC Nipple, 6" length, threaded for pipe extensions.	PVN-6050	1.89	40	$75.60	98
1/2" PVC Ball Valves	1/2" PVC Ball Valve, full port for main water shutoff.	SPE-3708	12.47	28	$349.16	96
2-1/2 to 1/2" NPT Taps	2-1/2" to 1/2" NPT reducing tap for adapting large to small lines.	TAP-35832	28.15	3	$84.45	72
8" PVC 40 SW Caps	8" PVC Schedule 40 socket cap for terminating main drain lines.	----	----	25	----	34
4" PVC 40 SW Caps	4" PVC Schedule 40 socket cap for capping branch drains.	CAP-40038	5.85	15	$87.75	91

Estimated Shipping: 3 to 5 business days

Avg Confidence: 78 Total: $596.96

Week 1 wireframe: Agents list + Create Agent

+---------------------------------------------------------------+
|  BLUEPRINT / FACTION-STYLE OFF-WHITE PAGE WITH GRID EDGES     |
|                                                               |
|   . . .                                                       |
|                                                               |
|   AGENTS                                          [+ New]     |
|   ------------------------------------------------------      |
|   NAME                    CLIENT         LAST RUN    RUNS     |
|   Quote-Gen               Aronlight      2h ago      17       |
|   RMA                     Aronlight      ----        0        |
|   CV Screening            Sophie         ----        0        |
|                                                               |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|   . . .                                                       |
|                                                               |
|   NEW AGENT                                                   |
|                                                               |
|   NAME          [ Quote-Gen                           ]       |
|   CLIENT        [ Aronlight                  v        ]       |
|   DESCRIPTION   [ Parse inbound RFQ emails into a     ]       |
|                 [ structured quote with confidence    ]       |
|                                                               |
|   SOURCE        ( ) Folder upload    ( ) Gmail (soon)         |
|                                                               |
|   OUTPUT SCHEMA (YAML)                                        |
|   +-------------------------------------------------------+   |
|   | fields:                                               |   |
|   |   - name: raw_input                                   |   |
|   |     type: string                                      |   |
|   |   - name: description                                 |   |
|   |     type: string                                      |   |
|   |   - name: mpn                                         |   |
|   |     type: string                                      |   |
|   |   - name: unit_price                                  |   |
|   |     type: number                                      |   |
|   |   - name: qty                                         |   |
|   |     type: integer                                     |   |
|   +-------------------------------------------------------+   |
|                                                               |
|   GROUNDING    [ + Upload Excel or Markdown ]                 |
|                                                               |
|                                    [ Cancel ]  [ Create ]     |
|                                                               |
+---------------------------------------------------------------+

Reference products to benchmark

Humanloop

Prompt ops + evaluation. Useful for how they expose confidence and run logs.

LangSmith

Agent observability standard. Look at their run detail view and trace tree.

Retool AI

Visual builder. Don't copy the DAG, copy how they surface errors to non-technical users.

Handle.ai

a16z seeded vertical agent for insurance. Closest comp to what you're doing for manufacturing. Study their "approve/edit" loop.

Open questions

1. Reference platform. LOCKED: Faction. Aesthetic captured in the Visual Direction section above.

2. Who builds. Claude Code solo? You + Claude Code? Hire a Python dev now or in week 4? This shapes scope.

3. Agent #2 for week 4. Sophie screening (easy, no ERP), BCP assessment (paid context), or Aronlight RMA (same client, harder)?

4. Aronlight Loom blocker. Manel's walkthroughs are still pending. Build on synthetic data now and swap in real later, or wait?

Proposed next step

Start Week 1

Reference is locked (Faction). Next decisions: (a) confirm 4-primitives scope so I can write tasks/todo.md for Week 1. (b) Decide if the Python API lives in this repo or a sibling repo (e.g. agent-platform/). (c) Pick agent #2 for Week 4: Sophie CV screening, BCP assessment, or Aronlight RMA.

Recommendation: new sibling repo agent-platform/ since this will outlive Aronlight; lock 4 primitives as scope; agent #2 = Sophie CV screening (no ERP dependency, you have volume).

Aronlight Build, internal planning doc. Not for client distribution. Update via docs/mvp-plan.html.