Hyper-realistic blush rose digital painting of five translucent crystalline price tags floating in a descending row across a softly glowing pink and purple atmospheric scene, with a single golden coin balanced on the smallest tag and a subtle horizontal arrow of pink light beneath them.

The Token Triage Doctrine: How To Survive the AI Price War That Just Started

June 12, 2026

The AI price war you were warned about is no longer a forecast. It started this week.

On June 10, the Wall Street Journal reported that OpenAI is weighing "drastic" cuts to the prices it charges for tokens, the metering unit behind every ChatGPT API call (Wall Street Journal).

Sam Altman told staff at a recent internal event that AI costs had suddenly become "a huge issue" for customers (Anadolu Agency).

By the time WSJ published its follow-up the next day, the headline had hardened. "The AI Price War Is Here, Piling Pressure on OpenAI and Anthropic" (Wall Street Journal).

Here is what most operators are missing.

A price war at the top of the stack does not just lower your ChatGPT bill. It rearranges the entire shape of how you should be buying compute, what you should be paying for, and which jobs belong on which model.

If you keep running every workflow on a frontier model out of habit, you will quietly hand a competitor a 10x to 50x cost advantage inside the next 90 days.

The fix is a doctrine, not a discount. Let me show you the one I am using with founders this week.

What Actually Triggered the AI Price War in June 2026?

The trigger was Claude Fable 5, released June 9 by Anthropic at $10 per million input tokens and $50 per million output tokens, exactly twice the price of Opus 4.8 (Anthropic).

That made Fable 5 the most expensive publicly available frontier model on the market (DevelopersIO).

It also made the gap to everything else impossible to ignore.

Anthropic's Fable 5 is more than 50 times more expensive per token than DeepSeek's V4 Pro (Livemint).

DeepSeek V3.2 with a cache hit runs at $0.07 input and $1.10 output per million tokens, roughly 71x cheaper on input than Anthropic's flagship (Remote OpenClaw).

Qwen3.5-Flash sits at $0.10 input and $0.40 output. MiniMax M2.7 at $0.30 and $1.20. Kimi K2.5 at $0.60 and $2.50.

Big companies and startups, "chafing at rapidly escalating artificial intelligence costs, are increasingly turning to tools that tap into cheaper AI models, including some from China," per WSJ's reporting (Livemint).

That is the spark.

OpenAI now believes Anthropic is about to cut, and is moving first. OpenAI says it has the advantage because it spent the last year locking in compute capacity at lower rates than what new entrants can get today (Livemint).

Both companies filed confidentially for IPOs inside the last two weeks. Anthropic on June 1. OpenAI on June 8 (Blue Trust).

So they are racing to show growth, racing to defend share, and burning billions on compute while doing it.

The pressure is rolling downhill straight to your invoice.

Why Does a Price War Actually Matter For My Business?

Two reasons. One is obvious. One is the trap.

The obvious reason is that the cost per task on frontier models is about to fall. If you renegotiate or wait, your token bill on the same workload is likely to drop materially over the next two quarters.

The trap is that price cuts at the frontier do not automatically help you, because most operators are spending 80 percent of their token budget on workloads that should never have been on a frontier model in the first place.

A blog draft on Fable 5 at $50 output per million tokens costs roughly $0.085 for a 1,500-token post (MindStudio).

That same blog draft on Qwen3.5-Flash costs less than a third of a cent.

If you write 500 blog drafts a year, that is the difference between $42.50 and $1.50.

Multiply that by classification jobs, summarization, transcription cleanup, agent loops that read large context, and product description generation, and you have leaked five figures a year on jobs that did not need the smartest model in the world.

A price war reveals this leak. It does not fix it.

The fix is a triage system.

What Is the Token Triage Doctrine?

The Token Triage Doctrine is a 5-tier sort. Every AI workload in your business gets dropped into exactly one tier, based on what the job actually needs, not what looks impressive.

Then you route to the cheapest model that can do the job at your quality bar.

Here are the 5 tiers, with current 2026 reference pricing per million tokens.

Comparison table: Tier vs Job Type vs Reference Models vs Input Price vs Output Price

The doctrine has one rule.

A workload only earns a higher tier when a lower tier has been tested and failed on a measurable quality bar.

Not assumed to fail. Tested.

How Do I Apply the Token Triage Doctrine This Week?

Walk through this in one focused 90-minute session.

Step 1. Pull your last 30 days of token spend from every AI vendor invoice or dashboard. OpenAI Usage, Anthropic Console, Gemini billing, Azure OpenAI logs.

Step 2. List the top 10 workflows by total spend. Not by count. By dollars.

Step 3. For each workflow, write one sentence describing what the model is actually doing. "Drafts the first 80 percent of a sales email." "Reads a 12-page PDF and pulls 6 fields." "Decides whether a refund request is fraud."

Step 4. Assign each workflow to a tier using this test.

For Tier 1, the question is "would a wrong answer here cost me a client, a lawsuit, or a publishable mistake?" If yes, frontier.

For Tier 2, the question is "is this customer-facing or revenue-generating work where a 5 percent quality drop hurts conversion?" If yes, premium production.

For Tier 3, the question is "is this internal automation that other humans review?" If yes, standard.

For Tier 4, the question is "would a templated answer be 80 percent fine?" If yes, efficient bulk.

For Tier 5, the question is "is this batch, async, or stripped of risk?" If yes, commodity.

Step 5. For every workload in Tier 2 through 5, run a 50-sample A/B test against the next cheapest tier. If the cheaper tier passes at your quality bar, route there. If it fails, log the failure mode and stay one tier up.

That is the entire doctrine.

What Does This Look Like For a Real Business?

Take a coaching business doing $2 million a year with an AI stack that includes content creation, customer support automation, a sales-enablement assistant, and a research bot for the founder.

Before triage, every workload runs on Claude Opus 4.8 because "that's the smart one."

Monthly spend hits $4,800.

After triage, the founder's research bot stays on Fable 5 because she uses it for high-stakes investment decisions. That is correctly Tier 1.

The sales-enablement assistant moves to Opus 4.8. Customer-facing, revenue at stake, Tier 2.

Customer support automation routes to GPT-5.4. Internal humans review, Tier 3.

Bulk content drafts move to Qwen3.5-Flash. Founder edits everything anyway, Tier 4.

Transcription cleanup and meeting summary jobs move to DeepSeek V4 with cache. Pure batch, Tier 5.

New monthly spend lands around $1,150.

That is $3,650 a month saved, $43,800 a year, on the same outputs, with the same quality bar, with the same team. No headcount cut. No tool cancellation. Just routing.

This is the compounding edge that the AI price war reveals. The price cuts at the top are noise. The routing audit is the signal.

Will the Price War Actually Hurt OpenAI and Anthropic?

Yes, and you should care because it changes which providers will still exist in 18 months.

Both companies are already losing billions a year to compute costs (Livemint).

WSJ reports that cuts at this level would "widen losses" at both companies and threaten their path to becoming "financially viable" (Wall Street Journal).

Reuters reported on the OpenAI versus Anthropic rivalry this week as "the bitter battle for the future of AI," with both companies pushing toward blockbuster IPOs that depend on showing growing margin, not shrinking (WHTC).

Anthropic raised $30 billion at a $380 billion valuation in February, then $65 billion at $965 billion in May (Tech Wire Asia).

OpenAI raised $122 billion in March at an $852 billion valuation (Blue Trust).

The investors writing those checks want margin expansion. A price war crushes that math.

So expect three responses inside the next 90 days.

One, frontier pricing for top-tier models holds or even rises, because that is where margin lives.

Two, mid-tier pricing collapses, because that is where China and open-source models are eating share.

Three, free credits, prompt caching discounts, and batch API offers expand sharply, to keep volume sticky.

Plan your routing now for the world that arrives after the cuts, not the one you live in today.

How Do I Stop Getting Locked Into One Provider?

Pick a routing layer this quarter.

Anthropic's Fable 5 is locked into Anthropic's API. So is Opus 4.8.

GPT-5.5 is locked into OpenAI.

But your routing logic does not have to be.

Use a model abstraction layer like OpenRouter, LiteLLM, Vercel AI SDK, or a homegrown router that lets you switch model names per workflow with a config change rather than a code rewrite.

Tag every workflow with its current tier and current model.

Set a 90-day review cadence where you re-run the triage test with the newest cheaper tier in mind.

Build a "Plan B model" for every workflow. If your primary provider raises prices, deprecates a model, or has an outage, your Plan B routes traffic in under 10 minutes.

This is the routing posture that turns a price war from a threat into a tailwind for your P&L.

If you want help building this routing system, mapping your current workloads against the 5 tiers, and standing up a "Plan B model" for every workflow, book a one on one AI Implementation Session here.

We will walk your stack, score every workload, and hand you a routing plan you can deploy this week.

TL;DR

  • The AI price war started this week. OpenAI is weighing "drastic" token cuts in response to Anthropic, per WSJ (WSJ).
  • Anthropic's Fable 5 at $10 input and $50 output per million tokens is more than 50 times more expensive than DeepSeek V4 Pro (Livemint).
  • Sam Altman called AI costs "a huge issue" at a recent internal event (Anadolu Agency).
  • Both OpenAI and Anthropic filed confidentially for IPOs in the last two weeks, raising the stakes on margin defense (Blue Trust).
  • The fix is the Token Triage Doctrine, a 5-tier sort that routes each workload to the cheapest model that meets your quality bar.
  • A realistic triage audit on a $2M coaching business saves roughly $43,800 a year on the same outputs.
  • Build a routing layer this quarter so price moves at the frontier become a tailwind, not a tax.

Frequently Asked Questions

Should I just switch everything to DeepSeek or Qwen tomorrow?

No. Run a 50-sample A/B against your current quality bar first. Many bulk and batch workloads pass easily. Customer-facing copy and high-stakes reasoning often do not. Triage, then route.

Is it safe to send customer data to Chinese models?

It depends on your contracts and your regulator. Many enterprises route only de-identified or synthetic content to lower-cost providers and keep regulated workloads on US frontier providers under signed data processing agreements. Decide tier by tier, not blanket.

What is the difference between this and just turning on usage caps?

Usage caps prevent damage. Triage prevents waste. You need both. Caps stop runaway agents. Triage stops you from paying frontier prices for commodity work.

How often should I re-run the triage?

Every 90 days. Model prices, model quality, and new entrants change quarterly. Re-test the cheapest tier against your current bar every quarter and migrate any workload that now passes.

What if my team resists moving off the model they know?

Lead with quality data, not cost data. Show the A/B results at their quality bar. If the cheaper model passes, the resistance dissolves. If it fails, the workload stays. Triage decides, not vibes.

The leaders who win the next 12 months will not be the ones who paid the most for AI. They will be the ones who routed every workload to the model that did the job for the least.

That is the compounding edge hiding inside this price war.

Run the triage this week.

Back to Blog