Sift — START Hack 2026

public starthack-2026 v1 by andre Mar 23, 2026 hackathonstarthackai
Sift LogoSTART Hack 2026 — St. Gallen, Switzerland
Case: Chain IQ Group AG — Autonomous Sourcing Agent
Built in: 36 hours, team of 4

Next.js FastAPI Claude Tailwind CSS Railway Python

“The superpower is not the model — it’s structured data grounding.” — Chain IQ CTO, after reviewing our approach

Live Dashboard · API

Sift is an audit-ready autonomous sourcing agent that transforms chaotic purchase requests into structured, compliant supplier comparisons — with a full audit trail at every step. It knows what it knows, says what it doesn’t, and escalates when it should.

Processing 304 real procurement requests across 6 languages, 19 countries, and 3 currencies — grounded in 145 coded policies, 40 suppliers, and 599 pricing tiers from Chain IQ’s actual data.


The Challenge

Chain IQ Group AG is a Swiss procurement outsourcing company — 700 people, 60+ enterprise clients (UBS, AXA, KPMG, FedEx), 49 countries. They manage everything a company buys that isn’t its core product: IT, travel, consulting, facilities.

Every purchase request arrives differently — different languages, formats, levels of detail. A human sourcing specialist manually parses each one, checks compliance policies, finds suppliers, builds comparison tables, and routes approvals. Cost: $100–$217 per request. It doesn’t scale.

“Build a prototype of an autonomous sourcing agent that transforms messy purchase requests into clear, structured supplier comparisons.” — Chain IQ, START Hack 2026 Challenge Brief

Sift replaces this manual process with a 7-step AI pipeline that produces audit-ready output — not just an answer, but the reasoning and evidence behind every decision.

Manual ProcessSift
$100 – $217 per request~$0.50 per request (99.5% reduction)
Hours to daysUnder 3 minutes (optimization path to under 30s)
Implicit knowledge, no audit trailFull audit trail per step — what it knows, what it doesn’t, which policy applies
Scales linearly with headcountScales with compute — same pipeline, any volume

How It Works: 7-Stage Pipeline

Each stage is a separate, focused API call to Claude Sonnet 4.6. Instead of tool use or retrieval, we pre-inject exactly the filtered data each stage needs — eliminating retrieval uncertainty entirely.

flowchart TD
    INPUT["📝 Purchase Request\n(free text, any language)"]
    S1["1 · Extract Requirements\nCategory, specs, quantity, budget,\ndeadline, country, processing tier"]
    S2["2 · Detect Issues\nMissing info, contradictions,\nrestricted suppliers, ambiguous specs"]
    RC1{{"🔀 Reclassify?\nPython checks severity"}}
    SC{{"⚡ Critical blocker?"}}
    S3["3 · Apply Compliance Rules\n145 coded policies: approvals,\ncategory rules, geography, ESG"]
    S4["4 · Find Suppliers\n40 suppliers × 599 pricing tiers\n× 5 regions, pre-filtered"]
    RC2{{"🔀 Reclassify?\nPython checks costs"}}
    S5["5 · Rank & Compare\nWeighted scoring: price × quality\n× risk-ESG · override protection"]
    S6["6 · Explain Reasoning\nAudit-ready justification citing\npolicy IDs, confidence levels"]
    S7["7 · Escalation Check\n8 rules (ER-001–ER-008): auto-approve\n/ needs-review / requires-escalation"]
    OUT["📄 Output\nStructured JSON + audit trail\n+ internal report + buyer report"]

    INPUT --> S1 --> S2 --> RC1
    RC1 -->|"tier upgrade"| S2
    RC1 -->|"no change"| SC
    SC -->|"yes — skip to escalation"| S7
    SC -->|"no"| S3 --> S4 --> RC2
    RC2 -->|"tier upgrade"| S4
    RC2 -->|"no change"| S5 --> S6 --> S7 --> OUT

    style INPUT fill:#1a1a1a,stroke:#EC1E24,color:#fff
    style OUT fill:#1a1a1a,stroke:#EC1E24,color:#fff
    style S1 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style S2 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style S3 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style S4 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style S5 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style S6 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style S7 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style RC1 fill:#92400e,stroke:#f59e0b,color:#fef3c7
    style RC2 fill:#92400e,stroke:#f59e0b,color:#fef3c7
    style SC fill:#991b1b,stroke:#ef4444,color:#fecaca

Three-Tier Autonomy

Not all requests are equal. Sift classifies each into a processing tier — and reclassifies deterministically when evidence warrants it (Python code, not the LLM):

flowchart LR
    REQ["Purchase\nRequest"] --> CLASSIFY{{"Classify by\nbudget + evidence"}}

    CLASSIFY -->|"< €25K\nno issues"| MKT
    CLASSIFY -->|"€25K – €500K\nor issues detected"| TECH
    CLASSIFY -->|"> €500K\nor data residency\nor multi-country"| STRAT

    subgraph MKT["🟢 Marketplace"]
        direction TB
        M1["Fully autonomous"]
        M2["Auto-approve"]
        M3["Office supplies, IT peripherals"]
    end

    subgraph TECH["🟡 Technical"]
        direction TB
        T1["Agent processes"]
        T2["Human approves"]
        T3["Specialized equipment, consulting"]
    end

    subgraph STRAT["🔴 Strategic"]
        direction TB
        ST1["Agent assists"]
        ST2["Human decides"]
        ST3["Enterprise software, multi-year"]
    end

    MKT -.->|"CRITICAL issues\nor cost > €25K"| TECH
    TECH -.->|"cost > €500K\nor compliance flag"| STRAT

    style MKT fill:#064e3b,stroke:#10b981,color:#d1fae5
    style TECH fill:#78350f,stroke:#f59e0b,color:#fef3c7
    style STRAT fill:#7f1d1d,stroke:#ef4444,color:#fecaca
    style CLASSIFY fill:#1e293b,stroke:#64748b,color:#e2e8f0
    style REQ fill:#1a1a1a,stroke:#EC1E24,color:#fff

Run the same request twice, get the same tier — classification is reproducible and auditable.


Robustness: Knowing What It Doesn’t Know

“A system that produces confident wrong answers will score lower than one that correctly identifies uncertainty and escalates.” — Chain IQ Challenge Brief

This principle shaped every design decision.

Contradiction & Missing Data Handling

When a request says “budget: €50K” but asks for items totaling €200K, or specifies a restricted supplier, Sift doesn’t guess — it flags with structured severity:

{
  "issue_id": "ISS-003",
  "type": "CONTRADICTION",
  "severity": "HIGH",
  "description": "Stated budget €50,000 insufficient for requested quantities at market rates",
  "resolution": "REQUIRES_CLARIFICATION",
  "suggested_action": "Request budget confirmation or scope reduction from buyer"
}

Each issue: severity (LOW / MEDIUM / HIGH / CRITICAL) + type + concrete resolution path. The clarification loop auto-generates a structured question for the client via the Smart Connect portal, then re-processes with the new context.

8 Escalation Rules

Deterministic routing based on evidence, not LLM judgment:

RuleTriggerAction
ER-001Budget > €500KRoute to Head of Strategic Sourcing
ER-002Critical compliance violationImmediate escalation, block processing
ER-003Restricted supplier requestedFlag + suggest alternative
ER-004Multi-country deliveryRequire strategic review
ER-005Data residency constraintsLegal review required
ER-006Single-source justification neededManager approval + documentation
ER-007Budget deviation > 20% from historicalAnomaly flag + review
ER-008CRITICAL issues at Stage 2Short-circuit to escalation

Confidence & Override Protection

Every recommendation includes a confidence level with explicit reasoning about what the agent knows vs. what it’s uncertain about. Low confidence triggers mandatory review.

If a buyer selects a supplier that isn’t ranked #1, they must document the deviation reason — creating accountability without blocking the process.


Deployed & Running

Sift isn’t a notebook or a local demo — it’s deployed end-to-end on Railway, processing real procurement data.

Detail
FrontendLive on Railway — auto-deploy from main
BackendLive on Railway — persistent volume for workspace
CI/CDPush to main → both services deploy automatically
Cost model~$0.50/request (7 Claude API calls × ~$0.07 each)
Data persistenceFile-based workspace — each request gets a folder with structured JSON outputs
Scaling pathParallel stage execution, response caching, batch processing

The architecture separates AI reasoning from deterministic logic. Claude handles natural language understanding and reasoning; Python handles classification, policy validation, and threshold checks. The deterministic parts can be tested, versioned, and audited independently of the model.


Tech Stack

LayerTechnologyPurpose
FrontendNext.js 14 + Tailwind CSSDashboard + Smart Connect portal
BackendFastAPI (Python 3.12, uv)SSE streaming, CRUD, report generation
AIClaude Sonnet 4.6 (Anthropic API)7 staged API calls per request
DataChain IQ datasets304 requests, 40 suppliers, 599 pricing tiers, 145 policies
DeployRailwayAuto-deploy, persistent volume

Project Structure

starthack-2026/
├── frontend/                    # Next.js 14 dashboard
│   ├── app/                     # Dashboard + Smart Connect submit portal
│   ├── components/
│   │   ├── StepProgressBar.tsx  # Real-time SSE pipeline progress
│   │   ├── ComparisonTable.tsx  # Supplier ranking with weighted scores
│   │   ├── AuditTrailPanel.tsx  # Full pipeline transparency view
│   │   ├── CompliancePanel.tsx  # Policy validation results
│   │   ├── ScoreRing.tsx        # Visual score indicators
│   │   └── ...                  # 18 specialized components
│   ├── lib/                     # API client, adapters, types
│   └── public/brand/            # Sift logo assets (SVG)

├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI — SSE, CRUD, reports
│   │   ├── agent.py             # 7-step staged pipeline (2,053 lines)
│   │   ├── data_loader.py       # Dataset loader + filters
│   │   ├── report.py            # Internal audit report (HTML)
│   │   ├── buyer_report.py      # Buyer-facing comparison (HTML)
│   │   └── trace_viewer.py      # Debug trace viewer
│   └── workspace/
│       ├── CLAUDE.md            # Agent system prompt
│       ├── data/                # Chain IQ datasets
│       │   ├── requests.json    # 304 purchase requests
│       │   ├── suppliers.csv    # 40 approved suppliers
│       │   ├── pricing.csv      # 599 pricing tiers × 5 regions
│       │   ├── policies.json    # 145 procurement policies
│       │   └── categories.csv   # Procurement taxonomy
│       └── requests/REQ-*/      # Processed request outputs

├── prep/01-chain-iq/            # Research, pitch prep, mentor feedback
└── DEPLOYMENT.md                # Railway deployment guide

Getting Started

Prerequisites

Backend

cd backend
uv sync
echo "ANTHROPIC_API_KEY=your_key" > .env
uv run uvicorn app.main:app --host 0.0.0.0 --port 8000

Frontend

cd frontend
npm install
echo "NEXT_PUBLIC_API_URL=http://localhost:8000" > .env.local
npm run dev

Open http://localhost:3000.


Evaluation Criteria Alignment

How Sift maps to the Chain IQ case scoring rubric:

CriteriaWeightHow Sift Addresses It
Feasibility25%Deployed end-to-end on Railway. Real data, real API calls, real outputs. File-based persistence with clear scaling path. Production-realistic cost model (~$0.50/req).
Robustness & Escalation25%8 deterministic escalation rules. Python-based tier reclassification. Short-circuit on critical blockers. Confidence scoring with explicit uncertainty. Override protection. Clarification loop. The system says “I don’t know” when it should.
Creativity20%Three-tier autonomy model. Pre-injection architecture (no tool use). Deterministic+AI hybrid. Clarification loop with structured re-processing.
Reachability20%304 real requests, 6 languages, 19 countries. 145 coded policies. 40 suppliers with 599 pricing tiers. Addresses the tail spend problem ($5–20M annual savings).
Visual Design10%Real-time SSE progress bar. Tabbed detail view (7 panels). Score rings. Two HTML report types: internal audit + buyer-facing comparison.

Team

NameGitHubRole
Andre Pacheco@A-PachecoTBackend, AI Pipeline, Architecture
Alvaro Zuñiga@alvarogiozuFrontend, Dashboard UI
Melissa Noriega@Melissa1221Frontend, UX, Customer Journey
Freddy@Freddyx14Validation, QA, Pitch

Built with focus and no sleep at START Hack 2026, St. Gallen