The compute tax

Nvidia says compute costs more than its engineers. Uber blew through its 2026 AI budget on tokens. Gartner forecasts data center spending up 55.8%. The bill compounds. Stored intelligence is the only line item that doesn't.

Bryan Catanzaro runs applied deep learning at Nvidia. In April he told Axios: “For my team, the cost of compute is far beyond the costs of the employees.” Read that twice. The man whose company makes the chips is paying more for the inference than for the engineers who design the inference systems.

The same week, Uber’s CTO Praveen Neppalli Naga told The Information he had blown through his entire 2026 AI budget on token costs. Uber spent $3.4 billion on R&D in 2025 and ran internal leaderboards to push engineers onto Claude Code and Cursor. AI agents now write about 11% of Uber’s live backend code. The aggressive program worked. The bill arrived in March.

This is what the compute tax looks like at companies that are good at this.

The data center is winning

Gartner’s April forecast put 2026 worldwide IT spending at $6.31 trillion, up 13.5% from 2025. Inside that headline, one segment dwarfs the rest: data center systems are projected to grow 55.8% in a single year, from $506 billion to $788 billion. Devices grow 8.2%. Software grows 15.1%. IT services grow 9.0%. Data center spending alone is expanding more than four times faster than any other category, and the $282 billion of net-new data center spend in 2026 is larger than the total 2025 spend on PCs and smartphones combined.

John-David Lovelock, Distinguished VP Analyst at Gartner, names what’s driving it: AI workloads, advanced memory, and a “multi-speed IT market” where hyperscaler purchases and AI-centric software outperform everything else.

The supply side echoes the demand side. On the day this is published, Anthropic announced it would consume the full output of SpaceX’s Colossus 1 — more than 300 megawatts and over 220,000 NVIDIA GPUs — on top of a 5 GW pipeline with Amazon, 5 GW with Google and Broadcom, $30 billion with Microsoft and NVIDIA on Azure, and $50 billion with Fluidstack. One AI lab. Eleven figures of capacity commitments inside twelve months.

The tax flows back through the stack

The data center’s appetite is reshaping the consumer hardware market on its way through. Gartner’s February forecast estimates a 130% combined surge in DRAM and SSD prices by the end of 2026 — driven by data center buyers paying whatever it takes for high-bandwidth memory. The downstream effects are concrete: PC prices up 17%, smartphone prices up 13%, PC shipments down 10.4%, smartphone shipments down 8.4%. PC memory costs are climbing from 16% to 23% of bill of materials. The sub-$500 entry-level PC segment, Gartner says, will not exist by 2028.

Ranjit Atwal, the analyst behind that piece, put it directly: “This is the steepest contraction in device shipments witnessed in over a decade.” Buyers are holding devices 15–20% longer. Even AI PCs — the entire premise of which was new hardware demand — are projected to miss 50% market penetration until 2028.

The compute tax is not a hyperscaler problem that stops at the colocation door. It’s a global cost surface, and every layer of the stack pays into it.

What companies are doing about it

Anthropic raised prices on Claude. OpenAI investors are pitching Codex on the basis that it “maximizes tokens efficiently.” Amos Bar-Joseph, the CEO of Swan AI, framed his Anthropic invoice as a virtue on LinkedIn: “We’re building the first autonomous business — scaling with intelligence, not headcount.” That post went viral; the bill remains.

Brad Owens of Asymbl, who advises companies on workforce strategy, put the broader question to Axios: “The tone is shifting a bit more into what is the true value of a worker… human or digital?”

The framing assumes both costs are inevitable. The audit committees disagree. Every CFO who has run a quarter on this knows the inference budget has the same texture as the cloud bill in 2017 — the line item that grows because every team’s slice grows, the line item that no one wants to be the first to cap.

What caches and RAG don’t fix

The standard answer is the cache. The cache works for stable inputs and stable models. It breaks when the model upgrades. It never helps when the question shape changes.

RAG is the more sophisticated version of the same instinct: don’t recompute the answer, retrieve the relevant chunks and ask the model again. RAG cuts cost by an order of magnitude on document-grounded tasks. It does not cut cost to zero. The retrieval is cheap. The interpretation runs every time.

Both approaches assume the answer is something the model produces. As long as that’s true, the bill compounds with usage.

What an asset that answers itself looks like

A .0wav file is processed once. Diarization, alignment, sentiment, behavioral profile, complexity map, speaker embeddings — written into the file at creation. After that, “who spoke first?” and “what was the average words-per-minute?” are reads, not queries. Same input, same output, every time, without a model in the loop.

The economic case is narrow but specific: any media asset queried more than once. Call center recordings. Depositions. Telehealth sessions. Sales call transcripts. Podcast back catalogs. The compute is amortized into a one-time write, and the read is a few microseconds of HDF5 decompression. There is no per-token charge on the fortieth ask. There is no Anthropic price-change risk on the archive of files you processed last quarter.

This is not a defense against the compute tax for everything. It is a defense for the parts of the workload where the same questions keep getting asked.

What this argues for

Three line items shift if a team takes this seriously:

The inference budget for re-reads drops to zero. The processing budget for first reads becomes a one-time, predictable cost — closer to a storage line than a compute line. And the model-update tax stops applying to the archive: when the next ASR generation ships, you can re-process the assets you want re-processed; the rest stay readable, exactly as they were, at the same cost they always were.

The bill that doesn’t stop arriving is paid in tokens. Stored intelligence is paid once.

The frame Owens raised — what is the true value of a worker, human or digital? — assumes the work has to be performed live. For the parts of the workload where the question is known and the answer is property of the asset, neither worker performs it. The file does.


Next in this series: a benchmark of read costs across .0wav, RAG, and bare LLM queries on the same recording set. Subscribe to the research feed or the changelog for the data.

Sources

  1. 1. Higher Claude limits for SpaceX-backed compute — Anthropic , May 5, 2026
  2. 2. Worldwide IT spending to grow 13.5% in 2026, totaling $6.31 trillion — Gartner , Apr 21, 2026
  3. 3. Surging memory costs will reduce global PC and smartphone shipments in 2026 — Gartner , Feb 25, 2026
  4. 4. IT budgets are getting blown out by AI — Axios , Apr 25, 2026
  5. 5. Uber's Anthropic AI push hits a wall — Yahoo Finance , Apr 25, 2026