The Intelligent
Triage Layer
for AI.

Most apps use premium models for tasks that don't need them. Inferenz sits between your app and your providers, intelligently triaging queries to slash your inference bill without touching your code.

Join the waitlist

Four steps.
One line of code changed.

Inferenz intercepts every prompt before it reaches the model. In milliseconds, it decides which model should handle it. Your users never notice. Your invoice does.

01 — Intercept
Your traffic hits us first
You change your base URL from api.openai.com to your Inferenz endpoint. Every API call flows through us before it reaches any provider.
02 — Classify
Two engines, one decision
The Complexity Engine scores every prompt in under 5ms — no API call, no cost. Ambiguous prompts trigger the Cascade Engine: try cheap first, escalate only if needed.
03 — Route
Right model, right cost
Clear-cheap prompts go to GPT-4o mini ($0.15/1M). Clear-complex stay premium. Ambiguous prompts resolve cheap 60–80% of the time. Repeat or similar prompts are served instantly from cache at zero cost.
04 — Account
Every saving is logged
Shadow accounting tracks what you paid and what you would have paid — request by request, engine by engine. Savings are verifiable, transparent, and the basis of every invoice.
Live routing simulation Cache → Complexity → Cascade
Loading prompt...
GPT-4o mini · $0.15/1M
Gemini 2.0 Flash · $0.40/1M
Claude Sonnet 4.6 · $3.00/1M

Three layers.
Zero wasted spend.

Most routing tools make one guess per prompt. Inferenz runs three layers — cache, classify, cascade — so repeat prompts cost nothing, clear prompts route instantly, and ambiguous ones get a safety net.

Layer 00
Cache Layer
Pre-classification · <1ms · $0 cost
Checks every prompt against stored responses before the classifier even runs. Exact matches return instantly. Similar prompts are caught by semantic matching using Jaccard + character n-gram similarity — no embeddings, no API calls.
Up to ~80% hit rate on repetitive workloads
Engine 01
Complexity Engine
Pre-generation · <5ms · no API call
Analyses the prompt before any model sees it. Scores cognitive complexity using Bloom's Taxonomy, information density via Shannon Entropy, structural signals, and domain vocabulary. Deterministic — same prompt, same decision, every time.
Handles ~80% of cache misses
Engine 02
Cascade Engine
Post-generation · response verification
For ambiguous prompts, tries the cheap model first then scores the response across four confidence signals: completeness, hedging language, refusal detection, and format compliance. Escalates to premium only when needed. Based on AutoMix (NeurIPS 2024).
Handles ~20% of cache misses · 60–80% resolve cheap
Inferenz routing pipeline A flowchart showing a prompt entering Inferenz, passing through three proprietary layers, and reaching the right model at the right cost. Your app EVERY PROMPT INFERENZ Cache layer SEEN BEFORE? RETURN INSTANTLY · $0 HIT ⚡ Free $0 MISS Proprietary routing engine CLASSIFIES EVERY PROMPT · < 5MS · NO API CALL Response verification CONFIRMS QUALITY BEFORE DELIVERY Cheap model Simple tasks Mid model Moderate tasks Premium model Complex tasks RESPONSE DELIVERED · SAVING LOGGED

A free audit.
No commitment.

Before you change anything, we show you exactly what you'd save. Export your API logs, send them to us, and we return a one-page report with your numbers.

1
Export your API logs
Request a 24–48 hour CSV export from your OpenAI or Anthropic dashboard. We only need token counts and model names — no prompt content required.
2
We run both engines
We run the Complexity Engine and simulate Cascade routing against every row. Takes about 30 minutes on our end.
3
You see the numbers
We return a one-page savings report: current daily spend, optimised spend, projected monthly saving. If the numbers don't stack up, we'll tell you.
4
One line of code
Change your base URL. We handle routing, privacy compliance, and billing automatically.
baseURL: "https://api.inferenz.com.au/[your-id]"

What a real audit
looks like.

A 24-hour log from a mid-size AI product. 12,400 requests, all routed to GPT-4o. Here's what both engines found.

Inferenz Audit Report — Sample client · 24h period Sample data
Requests analysed
0
100% on GPT-4o
Safely rerouted
0
37% of all requests
Daily spend — current
$292
AUD · all on premium
Daily spend — Inferenz
$185
AUD · optimised routing
Monthly saving
$3,226
AUD · projected 30 days
Key insight
37% of prompts needed no premium model at all

Based on a simulated 24-hour log.

We don't charge
unless we save you money.

No monthly fee. No setup cost. No lock-in. We take 10% of the savings we generate — nothing more.

10%
of verified monthly savings
Savings are calculated from shadow accounting data — every request logged, every dollar accounted for. You can verify every line. We invoice via Stripe automatically.

Our incentive is always aligned with yours: the more we save you, the more we earn.
Example — 30 day period
Without Inferenz$5,550 AUD
With Inferenz$1,650 AUD
Total saved$3,900 AUD
Inferenz fee (10%)$390 AUD
You keep$3,510 AUD

Join the waitlist.
Get the first audit free.

We're onboarding a small group of AU AI companies to validate the platform. Sign up below — if you're spending on inference we'll run a free audit and show you your numbers before you commit to anything.

Takes 2 minutes
Tell us about your AI spend.
We'll show you your savings.
The form covers your company, monthly AI spend, and what you're building. Takes 2 minutes. After you submit, we'll run a free audit and send you a one-page savings report within 48 hours.
Apply for early access

We review every application personally · AI companies only · Reply within 48 hours

Free
Audit cost
48h
Response time
10%
Of savings only
Get started

Want to find out more?
Get in touch with us

hello@inferenz.com.au