Sift — START Hack 2026
| START Hack 2026 — St. Gallen, Switzerland Case: Chain IQ Group AG — Autonomous Sourcing Agent Built in: 36 hours, team of 4 |
|---|
“The superpower is not the model — it’s structured data grounding.” — Chain IQ CTO, after reviewing our approach
Sift is an audit-ready autonomous sourcing agent that transforms chaotic purchase requests into structured, compliant supplier comparisons — with a full audit trail at every step. It knows what it knows, says what it doesn’t, and escalates when it should.
Processing 304 real procurement requests across 6 languages, 19 countries, and 3 currencies — grounded in 145 coded policies, 40 suppliers, and 599 pricing tiers from Chain IQ’s actual data.
The Challenge
Chain IQ Group AG is a Swiss procurement outsourcing company — 700 people, 60+ enterprise clients (UBS, AXA, KPMG, FedEx), 49 countries. They manage everything a company buys that isn’t its core product: IT, travel, consulting, facilities.
Every purchase request arrives differently — different languages, formats, levels of detail. A human sourcing specialist manually parses each one, checks compliance policies, finds suppliers, builds comparison tables, and routes approvals. Cost: $100–$217 per request. It doesn’t scale.
“Build a prototype of an autonomous sourcing agent that transforms messy purchase requests into clear, structured supplier comparisons.” — Chain IQ, START Hack 2026 Challenge Brief
Sift replaces this manual process with a 7-step AI pipeline that produces audit-ready output — not just an answer, but the reasoning and evidence behind every decision.
| Manual Process | Sift |
|---|---|
| $100 – $217 per request | ~$0.50 per request (99.5% reduction) |
| Hours to days | Under 3 minutes (optimization path to under 30s) |
| Implicit knowledge, no audit trail | Full audit trail per step — what it knows, what it doesn’t, which policy applies |
| Scales linearly with headcount | Scales with compute — same pipeline, any volume |
How It Works: 7-Stage Pipeline
Each stage is a separate, focused API call to Claude Sonnet 4.6. Instead of tool use or retrieval, we pre-inject exactly the filtered data each stage needs — eliminating retrieval uncertainty entirely.
flowchart TD
INPUT["📝 Purchase Request\n(free text, any language)"]
S1["1 · Extract Requirements\nCategory, specs, quantity, budget,\ndeadline, country, processing tier"]
S2["2 · Detect Issues\nMissing info, contradictions,\nrestricted suppliers, ambiguous specs"]
RC1{{"🔀 Reclassify?\nPython checks severity"}}
SC{{"⚡ Critical blocker?"}}
S3["3 · Apply Compliance Rules\n145 coded policies: approvals,\ncategory rules, geography, ESG"]
S4["4 · Find Suppliers\n40 suppliers × 599 pricing tiers\n× 5 regions, pre-filtered"]
RC2{{"🔀 Reclassify?\nPython checks costs"}}
S5["5 · Rank & Compare\nWeighted scoring: price × quality\n× risk-ESG · override protection"]
S6["6 · Explain Reasoning\nAudit-ready justification citing\npolicy IDs, confidence levels"]
S7["7 · Escalation Check\n8 rules (ER-001–ER-008): auto-approve\n/ needs-review / requires-escalation"]
OUT["📄 Output\nStructured JSON + audit trail\n+ internal report + buyer report"]
INPUT --> S1 --> S2 --> RC1
RC1 -->|"tier upgrade"| S2
RC1 -->|"no change"| SC
SC -->|"yes — skip to escalation"| S7
SC -->|"no"| S3 --> S4 --> RC2
RC2 -->|"tier upgrade"| S4
RC2 -->|"no change"| S5 --> S6 --> S7 --> OUT
style INPUT fill:#1a1a1a,stroke:#EC1E24,color:#fff
style OUT fill:#1a1a1a,stroke:#EC1E24,color:#fff
style S1 fill:#1e293b,stroke:#334155,color:#e2e8f0
style S2 fill:#1e293b,stroke:#334155,color:#e2e8f0
style S3 fill:#1e293b,stroke:#334155,color:#e2e8f0
style S4 fill:#1e293b,stroke:#334155,color:#e2e8f0
style S5 fill:#1e293b,stroke:#334155,color:#e2e8f0
style S6 fill:#1e293b,stroke:#334155,color:#e2e8f0
style S7 fill:#1e293b,stroke:#334155,color:#e2e8f0
style RC1 fill:#92400e,stroke:#f59e0b,color:#fef3c7
style RC2 fill:#92400e,stroke:#f59e0b,color:#fef3c7
style SC fill:#991b1b,stroke:#ef4444,color:#fecaca
Three-Tier Autonomy
Not all requests are equal. Sift classifies each into a processing tier — and reclassifies deterministically when evidence warrants it (Python code, not the LLM):
flowchart LR
REQ["Purchase\nRequest"] --> CLASSIFY{{"Classify by\nbudget + evidence"}}
CLASSIFY -->|"< €25K\nno issues"| MKT
CLASSIFY -->|"€25K – €500K\nor issues detected"| TECH
CLASSIFY -->|"> €500K\nor data residency\nor multi-country"| STRAT
subgraph MKT["🟢 Marketplace"]
direction TB
M1["Fully autonomous"]
M2["Auto-approve"]
M3["Office supplies, IT peripherals"]
end
subgraph TECH["🟡 Technical"]
direction TB
T1["Agent processes"]
T2["Human approves"]
T3["Specialized equipment, consulting"]
end
subgraph STRAT["🔴 Strategic"]
direction TB
ST1["Agent assists"]
ST2["Human decides"]
ST3["Enterprise software, multi-year"]
end
MKT -.->|"CRITICAL issues\nor cost > €25K"| TECH
TECH -.->|"cost > €500K\nor compliance flag"| STRAT
style MKT fill:#064e3b,stroke:#10b981,color:#d1fae5
style TECH fill:#78350f,stroke:#f59e0b,color:#fef3c7
style STRAT fill:#7f1d1d,stroke:#ef4444,color:#fecaca
style CLASSIFY fill:#1e293b,stroke:#64748b,color:#e2e8f0
style REQ fill:#1a1a1a,stroke:#EC1E24,color:#fff
Run the same request twice, get the same tier — classification is reproducible and auditable.
Robustness: Knowing What It Doesn’t Know
“A system that produces confident wrong answers will score lower than one that correctly identifies uncertainty and escalates.” — Chain IQ Challenge Brief
This principle shaped every design decision.
Contradiction & Missing Data Handling
When a request says “budget: €50K” but asks for items totaling €200K, or specifies a restricted supplier, Sift doesn’t guess — it flags with structured severity:
{
"issue_id": "ISS-003",
"type": "CONTRADICTION",
"severity": "HIGH",
"description": "Stated budget €50,000 insufficient for requested quantities at market rates",
"resolution": "REQUIRES_CLARIFICATION",
"suggested_action": "Request budget confirmation or scope reduction from buyer"
}
Each issue: severity (LOW / MEDIUM / HIGH / CRITICAL) + type + concrete resolution path. The clarification loop auto-generates a structured question for the client via the Smart Connect portal, then re-processes with the new context.
8 Escalation Rules
Deterministic routing based on evidence, not LLM judgment:
| Rule | Trigger | Action |
|---|---|---|
| ER-001 | Budget > €500K | Route to Head of Strategic Sourcing |
| ER-002 | Critical compliance violation | Immediate escalation, block processing |
| ER-003 | Restricted supplier requested | Flag + suggest alternative |
| ER-004 | Multi-country delivery | Require strategic review |
| ER-005 | Data residency constraints | Legal review required |
| ER-006 | Single-source justification needed | Manager approval + documentation |
| ER-007 | Budget deviation > 20% from historical | Anomaly flag + review |
| ER-008 | CRITICAL issues at Stage 2 | Short-circuit to escalation |
Confidence & Override Protection
Every recommendation includes a confidence level with explicit reasoning about what the agent knows vs. what it’s uncertain about. Low confidence triggers mandatory review.
If a buyer selects a supplier that isn’t ranked #1, they must document the deviation reason — creating accountability without blocking the process.
Deployed & Running
Sift isn’t a notebook or a local demo — it’s deployed end-to-end on Railway, processing real procurement data.
| Detail | |
|---|---|
| Frontend | Live on Railway — auto-deploy from main |
| Backend | Live on Railway — persistent volume for workspace |
| CI/CD | Push to main → both services deploy automatically |
| Cost model | ~$0.50/request (7 Claude API calls × ~$0.07 each) |
| Data persistence | File-based workspace — each request gets a folder with structured JSON outputs |
| Scaling path | Parallel stage execution, response caching, batch processing |
The architecture separates AI reasoning from deterministic logic. Claude handles natural language understanding and reasoning; Python handles classification, policy validation, and threshold checks. The deterministic parts can be tested, versioned, and audited independently of the model.
Tech Stack
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Next.js 14 + Tailwind CSS | Dashboard + Smart Connect portal |
| Backend | FastAPI (Python 3.12, uv) | SSE streaming, CRUD, report generation |
| AI | Claude Sonnet 4.6 (Anthropic API) | 7 staged API calls per request |
| Data | Chain IQ datasets | 304 requests, 40 suppliers, 599 pricing tiers, 145 policies |
| Deploy | Railway | Auto-deploy, persistent volume |
Project Structure
starthack-2026/
├── frontend/ # Next.js 14 dashboard
│ ├── app/ # Dashboard + Smart Connect submit portal
│ ├── components/
│ │ ├── StepProgressBar.tsx # Real-time SSE pipeline progress
│ │ ├── ComparisonTable.tsx # Supplier ranking with weighted scores
│ │ ├── AuditTrailPanel.tsx # Full pipeline transparency view
│ │ ├── CompliancePanel.tsx # Policy validation results
│ │ ├── ScoreRing.tsx # Visual score indicators
│ │ └── ... # 18 specialized components
│ ├── lib/ # API client, adapters, types
│ └── public/brand/ # Sift logo assets (SVG)
│
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI — SSE, CRUD, reports
│ │ ├── agent.py # 7-step staged pipeline (2,053 lines)
│ │ ├── data_loader.py # Dataset loader + filters
│ │ ├── report.py # Internal audit report (HTML)
│ │ ├── buyer_report.py # Buyer-facing comparison (HTML)
│ │ └── trace_viewer.py # Debug trace viewer
│ └── workspace/
│ ├── CLAUDE.md # Agent system prompt
│ ├── data/ # Chain IQ datasets
│ │ ├── requests.json # 304 purchase requests
│ │ ├── suppliers.csv # 40 approved suppliers
│ │ ├── pricing.csv # 599 pricing tiers × 5 regions
│ │ ├── policies.json # 145 procurement policies
│ │ └── categories.csv # Procurement taxonomy
│ └── requests/REQ-*/ # Processed request outputs
│
├── prep/01-chain-iq/ # Research, pitch prep, mentor feedback
└── DEPLOYMENT.md # Railway deployment guide
Getting Started
Prerequisites
- Node.js 18+
- Python 3.12+
- uv (Python package manager)
- Anthropic API key
Backend
cd backend
uv sync
echo "ANTHROPIC_API_KEY=your_key" > .env
uv run uvicorn app.main:app --host 0.0.0.0 --port 8000
Frontend
cd frontend
npm install
echo "NEXT_PUBLIC_API_URL=http://localhost:8000" > .env.local
npm run dev
Open http://localhost:3000.
Evaluation Criteria Alignment
How Sift maps to the Chain IQ case scoring rubric:
| Criteria | Weight | How Sift Addresses It |
|---|---|---|
| Feasibility | 25% | Deployed end-to-end on Railway. Real data, real API calls, real outputs. File-based persistence with clear scaling path. Production-realistic cost model (~$0.50/req). |
| Robustness & Escalation | 25% | 8 deterministic escalation rules. Python-based tier reclassification. Short-circuit on critical blockers. Confidence scoring with explicit uncertainty. Override protection. Clarification loop. The system says “I don’t know” when it should. |
| Creativity | 20% | Three-tier autonomy model. Pre-injection architecture (no tool use). Deterministic+AI hybrid. Clarification loop with structured re-processing. |
| Reachability | 20% | 304 real requests, 6 languages, 19 countries. 145 coded policies. 40 suppliers with 599 pricing tiers. Addresses the tail spend problem ($5–20M annual savings). |
| Visual Design | 10% | Real-time SSE progress bar. Tabbed detail view (7 panels). Score rings. Two HTML report types: internal audit + buyer-facing comparison. |
Team
| Name | GitHub | Role |
|---|---|---|
| Andre Pacheco | @A-PachecoT | Backend, AI Pipeline, Architecture |
| Alvaro Zuñiga | @alvarogiozu | Frontend, Dashboard UI |
| Melissa Noriega | @Melissa1221 | Frontend, UX, Customer Journey |
| Freddy | @Freddyx14 | Validation, QA, Pitch |
Built with focus and no sleep at START Hack 2026, St. Gallen