Sift — START Hack 2026

public starthack-2026 v1 ·by andre ·Mar 23, 2026 operacionesreference

	START Hack 2026 — St. Gallen, Switzerland Case: Chain IQ Group AG — Autonomous Sourcing Agent Built in: 36 hours, team of 4

“The superpower is not the model — it’s structured data grounding.”
— Chain IQ CTO, after reviewing our approach

Live Dashboard · API

Sift is an audit-ready autonomous sourcing agent that transforms chaotic purchase requests into structured, compliant supplier comparisons — with a full audit trail at every step. It knows what it knows, says what it doesn’t, and escalates when it should.

Processing 304 real procurement requests across 6 languages, 19 countries, and 3 currencies — grounded in 145 coded policies, 40 suppliers, and 599 pricing tiers from Chain IQ’s actual data.

The Challenge

Chain IQ Group AG is a Swiss procurement outsourcing company — 700 people, 60+ enterprise clients (UBS, AXA, KPMG, FedEx), 49 countries. They manage everything a company buys that isn’t its core product: IT, travel, consulting, facilities.

Every purchase request arrives differently — different languages, formats, levels of detail. A human sourcing specialist manually parses each one, checks compliance policies, finds suppliers, builds comparison tables, and routes approvals. Cost: $100–$217 per request. It doesn’t scale.

“Build a prototype of an autonomous sourcing agent that transforms messy purchase requests into clear, structured supplier comparisons.”
— Chain IQ, START Hack 2026 Challenge Brief

Sift replaces this manual process with a 7-step AI pipeline that produces audit-ready output — not just an answer, but the reasoning and evidence behind every decision.

Manual Process	Sift
$100 – $217 per request	~$0.50 per request (99.5% reduction)
Hours to days	Under 3 minutes (optimization path to under 30s)
Implicit knowledge, no audit trail	Full audit trail per step — what it knows, what it doesn’t, which policy applies
Scales linearly with headcount	Scales with compute — same pipeline, any volume

How It Works: 7-Stage Pipeline

Each stage is a separate, focused API call to Claude Sonnet 4.6. Instead of tool use or retrieval, we pre-inject exactly the filtered data each stage needs — eliminating retrieval uncertainty entirely.

flowchart TD
    INPUT["📝 Purchase Request\n(free text, any language)"]
    S1["1 · Extract Requirements\nCategory, specs, quantity, budget,\ndeadline, country, processing tier"]
    S2["2 · Detect Issues\nMissing info, contradictions,\nrestricted suppliers, ambiguous specs"]
    RC1{{"🔀 Reclassify?\nPython checks severity"}}
    SC{{"⚡ Critical blocker?"}}
    S3["3 · Apply Compliance Rules\n145 coded policies: approvals,\ncategory rules, geography, ESG"]
    S4["4 · Find Suppliers\n40 suppliers × 599 pricing tiers\n× 5 regions, pre-filtered"]
    RC2{{"🔀 Reclassify?\nPython checks costs"}}
    S5["5 · Rank & Compare\nWeighted scoring: price × quality\n× risk-ESG · override protection"]
    S6["6 · Explain Reasoning\nAudit-ready justification citing\npolicy IDs, confidence levels"]
    S7["7 · Escalation Check\n8 rules (ER-001–ER-008): auto-approve\n/ needs-review / requires-escalation"]
    OUT["📄 Output\nStructured JSON + audit trail\n+ internal report + buyer report"]

    INPUT --> S1 --> S2 --> RC1
    RC1 -->|"tier upgrade"| S2
    RC1 -->|"no change"| SC
    SC -->|"yes — skip to escalation"| S7
    SC -->|"no"| S3 --> S4 --> RC2
    RC2 -->|"tier upgrade"| S4
    RC2 -->|"no change"| S5 --> S6 --> S7 --> OUT

    style INPUT fill:#1a1a1a,stroke:#EC1E24,color:#fff
    style OUT fill:#1a1a1a,stroke:#EC1E24,color:#fff
    style S1 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style S2 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style S3 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style S4 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style S5 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style S6 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style S7 fill:#1e293b,stroke:#334155,color:#e2e8f0
    style RC1 fill:#92400e,stroke:#f59e0b,color:#fef3c7
    style RC2 fill:#92400e,stroke:#f59e0b,color:#fef3c7
    style SC fill:#991b1b,stroke:#ef4444,color:#fecaca

Three-Tier Autonomy

Not all requests are equal. Sift classifies each into a processing tier — and reclassifies deterministically when evidence warrants it (Python code, not the LLM):

flowchart LR
    REQ["Purchase\nRequest"] --> CLASSIFY{{"Classify by\nbudget + evidence"}}

    CLASSIFY -->|"< €25K\nno issues"| MKT
    CLASSIFY -->|"€25K – €500K\nor issues detected"| TECH
    CLASSIFY -->|"> €500K\nor data residency\nor multi-country"| STRAT

    subgraph MKT["🟢 Marketplace"]
        direction TB
        M1["Fully autonomous"]
        M2["Auto-approve"]
        M3["Office supplies, IT peripherals"]
    end

    subgraph TECH["🟡 Technical"]
        direction TB
        T1["Agent processes"]
        T2["Human approves"]
        T3["Specialized equipment, consulting"]
    end

    subgraph STRAT["🔴 Strategic"]
        direction TB
        ST1["Agent assists"]
        ST2["Human decides"]
        ST3["Enterprise software, multi-year"]
    end

    MKT -.->|"CRITICAL issues\nor cost > €25K"| TECH
    TECH -.->|"cost > €500K\nor compliance flag"| STRAT

    style MKT fill:#064e3b,stroke:#10b981,color:#d1fae5
    style TECH fill:#78350f,stroke:#f59e0b,color:#fef3c7
    style STRAT fill:#7f1d1d,stroke:#ef4444,color:#fecaca
    style CLASSIFY fill:#1e293b,stroke:#64748b,color:#e2e8f0
    style REQ fill:#1a1a1a,stroke:#EC1E24,color:#fff

Run the same request twice, get the same tier — classification is reproducible and auditable.

Robustness: Knowing What It Doesn’t Know

“A system that produces confident wrong answers will score lower than one that correctly identifies uncertainty and escalates.”
— Chain IQ Challenge Brief

This principle shaped every design decision.

Contradiction & Missing Data Handling

When a request says “budget: €50K” but asks for items totaling €200K, or specifies a restricted supplier, Sift doesn’t guess — it flags with structured severity:

{
  "issue_id": "ISS-003",
  "type": "CONTRADICTION",
  "severity": "HIGH",
  "description": "Stated budget €50,000 insufficient for requested quantities at market rates",
  "resolution": "REQUIRES_CLARIFICATION",
  "suggested_action": "Request budget confirmation or scope reduction from buyer"
}

Each issue: severity (LOW / MEDIUM / HIGH / CRITICAL) + type + concrete resolution path. The clarification loop auto-generates a structured question for the client via the Smart Connect portal, then re-processes with the new context.

8 Escalation Rules

Deterministic routing based on evidence, not LLM judgment:

Rule	Trigger	Action
ER-001	Budget > €500K	Route to Head of Strategic Sourcing
ER-002	Critical compliance violation	Immediate escalation, block processing
ER-003	Restricted supplier requested	Flag + suggest alternative
ER-004	Multi-country delivery	Require strategic review
ER-005	Data residency constraints	Legal review required
ER-006	Single-source justification needed	Manager approval + documentation
ER-007	Budget deviation > 20% from historical	Anomaly flag + review
ER-008	CRITICAL issues at Stage 2	Short-circuit to escalation

Confidence & Override Protection

Every recommendation includes a confidence level with explicit reasoning about what the agent knows vs. what it’s uncertain about. Low confidence triggers mandatory review.

If a buyer selects a supplier that isn’t ranked #1, they must document the deviation reason — creating accountability without blocking the process.

Deployed & Running

Sift isn’t a notebook or a local demo — it’s deployed end-to-end on Railway, processing real procurement data.

	Detail
Frontend	Live on Railway — auto-deploy from `main`
Backend	Live on Railway — persistent volume for workspace
CI/CD	Push to `main` → both services deploy automatically
Cost model	~$0.50/request (7 Claude API calls × ~$0.07 each)
Data persistence	File-based workspace — each request gets a folder with structured JSON outputs
Scaling path	Parallel stage execution, response caching, batch processing

The architecture separates AI reasoning from deterministic logic. Claude handles natural language understanding and reasoning; Python handles classification, policy validation, and threshold checks. The deterministic parts can be tested, versioned, and audited independently of the model.

Tech Stack

Layer	Technology	Purpose
Frontend	Next.js 14 + Tailwind CSS	Dashboard + Smart Connect portal
Backend	FastAPI (Python 3.12, uv)	SSE streaming, CRUD, report generation
AI	Claude Sonnet 4.6 (Anthropic API)	7 staged API calls per request
Data	Chain IQ datasets	304 requests, 40 suppliers, 599 pricing tiers, 145 policies
Deploy	Railway	Auto-deploy, persistent volume

Project Structure

starthack-2026/
├── frontend/                    # Next.js 14 dashboard
│   ├── app/                     # Dashboard + Smart Connect submit portal
│   ├── components/
│   │   ├── StepProgressBar.tsx  # Real-time SSE pipeline progress
│   │   ├── ComparisonTable.tsx  # Supplier ranking with weighted scores
│   │   ├── AuditTrailPanel.tsx  # Full pipeline transparency view
│   │   ├── CompliancePanel.tsx  # Policy validation results
│   │   ├── ScoreRing.tsx        # Visual score indicators
│   │   └── ...                  # 18 specialized components
│   ├── lib/                     # API client, adapters, types
│   └── public/brand/            # Sift logo assets (SVG)
│
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI — SSE, CRUD, reports
│   │   ├── agent.py             # 7-step staged pipeline (2,053 lines)
│   │   ├── data_loader.py       # Dataset loader + filters
│   │   ├── report.py            # Internal audit report (HTML)
│   │   ├── buyer_report.py      # Buyer-facing comparison (HTML)
│   │   └── trace_viewer.py      # Debug trace viewer
│   └── workspace/
│       ├── CLAUDE.md            # Agent system prompt
│       ├── data/                # Chain IQ datasets
│       │   ├── requests.json    # 304 purchase requests
│       │   ├── suppliers.csv    # 40 approved suppliers
│       │   ├── pricing.csv      # 599 pricing tiers × 5 regions
│       │   ├── policies.json    # 145 procurement policies
│       │   └── categories.csv   # Procurement taxonomy
│       └── requests/REQ-*/      # Processed request outputs
│
├── prep/01-chain-iq/            # Research, pitch prep, mentor feedback
└── DEPLOYMENT.md                # Railway deployment guide

Getting Started

Prerequisites

Node.js 18+
Python 3.12+
uv (Python package manager)
Anthropic API key

Backend

cd backend
uv sync
echo "ANTHROPIC_API_KEY=your_key" > .env
uv run uvicorn app.main:app --host 0.0.0.0 --port 8000

Frontend

cd frontend
npm install
echo "NEXT_PUBLIC_API_URL=http://localhost:8000" > .env.local
npm run dev

Open http://localhost:3000.

Evaluation Criteria Alignment

How Sift maps to the Chain IQ case scoring rubric:

Criteria	Weight	How Sift Addresses It
Feasibility	25%	Deployed end-to-end on Railway. Real data, real API calls, real outputs. File-based persistence with clear scaling path. Production-realistic cost model (~$0.50/req).
Robustness & Escalation	25%	8 deterministic escalation rules. Python-based tier reclassification. Short-circuit on critical blockers. Confidence scoring with explicit uncertainty. Override protection. Clarification loop. The system says “I don’t know” when it should.
Creativity	20%	Three-tier autonomy model. Pre-injection architecture (no tool use). Deterministic+AI hybrid. Clarification loop with structured re-processing.
Reachability	20%	304 real requests, 6 languages, 19 countries. 145 coded policies. 40 suppliers with 599 pricing tiers. Addresses the tail spend problem ($5–20M annual savings).
Visual Design	10%	Real-time SSE progress bar. Tabbed detail view (7 panels). Score rings. Two HTML report types: internal audit + buyer-facing comparison.

Team

Name	GitHub	Role
Andre Pacheco	@A-PachecoT	Backend, AI Pipeline, Architecture
Alvaro Zuñiga	@alvarogiozu	Frontend, Dashboard UI
Melissa Noriega	@Melissa1221	Frontend, UX, Customer Journey
Freddy	@Freddyx14	Validation, QA, Pitch

Built with focus and no sleep at START Hack 2026, St. Gallen