AI Token Bills Are Up 340% and Out of Control

“`html
AI Token Bills Are Up 340% and Out of Control
Companies are hemorrhaging money on AI they can’t measure. The average enterprise AI budget grew 340% between 2023 and 2025, according to Gartner, and most CFOs still can’t tell you which team is burning the most tokens. That gap between spend and visibility is where fortunes are lost.
The Reckoning Nobody Planned For
The scramble started quietly. Early adopters treated AI API costs like office supplies. They tossed them into a “software” line item and moved on. Then the invoices got bigger. A lot bigger.
According to Goldman Sachs research published in late 2025, U.S. companies collectively spent over $320 billion on AI infrastructure and API costs that year, up from $91 billion in 2023. Sequoia Capital flagged the mismatch early, pointing out that AI spending was growing far faster than the revenue it was supposed to generate. Now it’s 2026 and the scramble is real. Boards want answers. CFOs are building new budget categories. And the startups that built their entire product on top of expensive frontier models are quietly repricing, pivoting, or folding.
The pressure lands hardest on mid-sized companies. Big tech has negotiated custom pricing. Tiny startups have low volume. The companies in the middle are getting squeezed by full retail token prices with enterprise-scale usage and no one minding the meter.
The Measurement Problem Nobody Talks About
Here’s my take that nobody wants to hear: most companies don’t have an AI cost problem. They have a measurement problem. And those are very different things.
Rich people track every dollar. Poor people wonder where it went. The same rule applies to AI spend. I’ve watched companies cancel promising AI projects because costs “seemed too high,” only to discover later that one developer had left a test environment running for three months. The problem wasn’t the AI. The problem was zero visibility into who was spending what and why.
According to a16z’s 2025 AI Operating Report, the top quartile of AI-native companies spend 28 cents of every dollar on model inference costs. The bottom quartile spends over 62 cents. That gap isn’t talent. It’s process. The companies winning the cost game have dedicated AI spend owners, token budgets per team, and hard limits that trigger alerts before they trigger invoices.
The token bill problem is also a vendor problem. OpenAI, Anthropic, Google, and the rest charge by the token. That means every poorly written prompt, every unnecessary context window, every “just in case” API call adds real money to a real invoice. According to IBM’s 2025 AI Efficiency Report, the average enterprise wastes 31% of its AI API spend on redundant calls, oversized context windows, and models that are far more powerful than the task actually requires.
A company using a frontier model for something a smaller, cheaper model could handle is like flying first class to a meeting you could’ve done over the phone. The model doesn’t care about your ROI goals. You have to.
The smart operators I know treat AI spending like headcount. You wouldn’t hire 20 engineers without knowing what they’d build. You shouldn’t spin up 20 AI workflows without knowing what they’d cost. Set budgets per team. Track token consumption weekly. Use Wallester’s business card platform to give each department a separate virtual card tied to its AI API accounts, so spend is visible by team before the monthly invoice arrives. That single move turns an unreadable cloud bill into a full accountability report sorted by department.
What I Would Do Right Now
If you’re running a startup or managing an AI budget right now, here’s what I would do. Not next quarter. Today.
First, audit every active API key in your organization. You’ll find at least one attached to a project that’s been dead for six months. Shut it down today.
Second, stop using frontier models for tasks that don’t need them. Claude Haiku, GPT-4o mini, and Gemini Flash exist for a reason. They’re cheaper. They’re faster. For summarization, classification, and simple extraction tasks, they perform at 90% of the quality for 20% of the cost. According to Fireworks AI’s 2025 Benchmark Report, companies that matched model size to task complexity cut inference costs by an average of 44%.
Third, treat your AI spend like a payroll line. It has to be owned, approved, and reviewed on a fixed schedule. If you’re also managing a growing team alongside a growing AI bill, tools like Gusto handle payroll and benefits in one place so your finance team isn’t juggling two fast-moving cost centers manually. Visibility across both human and machine costs is how you build a P&L that doesn’t blindside you at month end.
Fourth, implement prompt caching. Every major API provider offers it. Most companies ignore it. Anthropic’s prompt caching can cut costs by up to 90% on repeated context, according to Anthropic’s published developer documentation. If you’re not using it, you’re leaving real money on the table every single day.
Fifth, set hard budget caps, not soft alerts. A soft alert gets ignored at 11pm on a Tuesday. A hard cap forces a conversation. That conversation makes your product leaner and your margins less fragile.
The Bottom Line
AI costs won’t stop growing. The companies that win aren’t the ones spending less. They’re the ones that know exactly what they’re spending and why. Measurement is the moat. If your competitors are flying blind on token spend and you’re not, that’s an edge that compounds every single quarter. The token bill is due. The only question is whether you’re the one holding the invoice or the one who signed it knowingly.
Frequently Asked Questions
What are AI token costs and why do they matter?
Token costs are what companies pay to use AI models through APIs. Every word sent to or received from a model like GPT-4 or Claude counts as tokens, and those tokens add up to real dollar charges on your monthly bill. As AI use scales across an organization, token costs can quietly become one of the largest line items in a tech budget.
Why are AI token bills rising so fast in 2026?
More teams are using AI for more tasks, and most companies have no controls on who can spin up new workflows or how much they can spend. The combination of wider adoption and zero budget governance is what drives the runaway bills. According to Goldman Sachs, total U.S. AI spend more than tripled between 2023 and 2025 with no sign of slowing.
How can a company reduce AI token costs without cutting AI use?
Match the model to the task, use prompt caching for repeated inputs, and audit your API keys monthly. Smaller models handle most business tasks well at a fraction of the cost of frontier models. Set hard spending caps by team so costs are contained before they compound into a quarterly surprise.
What’s the biggest mistake companies make with AI budgets?
Treating AI spend like a utility bill instead of a managed cost center. Utilities get paid without question. Managed costs get reviewed, challenged, and improved. The companies winning on AI margins treat every token like it matters, because at scale, it absolutely does.
Should startups be worried about AI token costs?
Yes, especially if your product margin depends on AI API calls at scale. Model a worst case where your usage doubles and check whether your margins survive. If they don’t, fix the architecture now while it’s still cheap to do so.
“`
Get stories like this in your inbox. Daily.
Free. No spam. The AI, tech, and finance stories that move money.