We spent three years panicking about GPU availability. Fair enough — NVIDIA's lead times were brutal, allocation was political, and cloud spot prices swung like crypto. But while everyone was fixated on the H100 waitlist, something shifted. The bottleneck moved.

In Q2 2026, the components most likely to blow up your infrastructure budget aren't the flashy accelerators. They're the CPUs and DRAM sticks you used to order on two-week lead times without a second thought.

Server DRAM: 60-70% Price Hikes and Counting

Samsung and SK hynix dropped a bombshell at the start of the year: server DRAM contract prices for Q1 2026 jumped 60-70% over Q4 2025 levels. Microsoft, Google, and the rest of the hyperscaler club are paying it because they have no choice. Samsung followed up with another ~30% increase for Q2 contracts on top of that.

The math behind the squeeze is straightforward. Datacenters now consume roughly 70% of all memory chips manufactured globally. Server-related memory — conventional DRAM, SOCAMM, and HBM combined — accounts for more than half of total DRAM demand, a ratio that's only accelerating. Goldman Sachs projects a 4.9% DRAM undersupply for 2026, the worst deficit in over fifteen years.

HBM is its own nightmare. Samsung and SK hynix hiked HBM3E prices by 20% for 2026 deliveries, and every B300 GPU ships with 288 GB of the stuff. When you're building GB300 NVL72 racks with 72 GPUs each, the memory bill alone is staggering. But even standard DDR5 for the host CPUs is getting expensive — and that used to be the cheap part of the BOM.

CPUs Are Cool Again (Unfortunately for Your Wallet)

Here's the plot twist nobody saw coming: CPUs are back in shortage.

Intel's Xeon server chip lead times have ballooned from two weeks to six months for some SKUs. AMD's EPYC chips aren't much better — lead times pushed to 8-10 weeks, with Lisa Su publicly stating that demand "exceeded expectations." Intel has quietly raised OEM server CPU prices by a cumulative ~30% across multiple rounds in 2026.

The culprit? Agentic AI.

Training LLMs was GPU-bound. You could get away with modest CPU-to-GPU ratios because the accelerators did the heavy lifting while CPUs mostly shuffled data. Agentic workloads are different. Autonomous agents need substantial CPU horsepower for orchestration, tool calling, real-time decision-making, and managing the complex DAGs that coordinate multiple model calls. A single agentic workflow might hit a dozen API endpoints, parse structured data, maintain state, and make branching decisions — all of which lands squarely on the CPU.

The result is that AI datacenters in 2026 need far more CPU cores per GPU than the training-era builds assumed. And the supply chain wasn't ready.

Why the Supply Side Can't Keep Up

Three factors are colliding simultaneously.

TSMC prioritization. AMD doesn't fab its own chips. TSMC's most advanced nodes are allocated first to high-margin AI accelerators — NVIDIA's Blackwell, AMD's own Instinct GPUs, Apple silicon. Server CPUs get what's left. AMD is effectively competing with itself for fab capacity.

Intel's yield problems. Intel fabs its own Xeon chips, but manufacturing yields at the latest process nodes have been inconsistent. Fewer usable chips per wafer means less supply even as demand ramps. Intel is working on capacity improvements, but the earliest relief is late Q2 at best.

The $602 billion CapEx tsunami. Hyperscaler capital expenditure hit 602 billion in 2026, amplified by moves like OpenAI's 122 billion infrastructure investment announced in April. That money is buying racks, and racks need CPUs and DRAM just as much as they need GPUs. Every GB300 NVL72 system has 36 Grace CPUs alongside those 72 GPUs — that's a lot of silicon.

What This Means for Your Inference Costs

The irony is sharp. GPU cloud pricing has actually gotten cheaper. H100 spot instances run $1.38/hr on competitive providers. A100s are under a dollar. The accelerator market has matured, supply has expanded, and tier-two cloud providers are undercutting each other aggressively.

But total cost of ownership is rising because everything around the GPU costs more:

Component Q4 2025 Q2 2026 Change
Server DRAM (per GB) ~$2.80 ~$4.50-4.75 +60-70%
HBM3E (per GB) ~$12 ~$14.40 +20%
Server CPUs (avg) Baseline +30% cumulative +30%
H100 cloud spot (per hr) ~$2.10 ~$1.38 -34%

If you're running inference at scale, the GPU line item might look fine while your total rack cost drifts upward. Memory-bound models — anything with long context windows, large KV caches, or retrieval-augmented pipelines — feel this disproportionately because they need more DRAM per GPU than a standard deployment.

The Agentic Tax

There's a deeper structural issue here. The shift from "call an LLM endpoint" to "deploy an autonomous agent" changes the hardware profile of AI infrastructure in ways that most cost models haven't caught up with.

An agentic system doesn't just need inference throughput. It needs:

  • CPU cores for orchestration logic

  • Memory for maintaining agent state across long-running tasks

  • Storage I/O for tool outputs, logs, and intermediate results

  • Network bandwidth for parallel API calls

None of these show up in a $/token calculation, but they all show up on the invoice. When AMD says agentic AI is driving unexpected CPU demand, they're describing a fundamental shift in what "AI compute" means — and it's not just about FLOPS anymore.

Where This Goes

The DRAM shortage is projected to persist through 2027. CPU constraints should ease somewhat by late 2026 as Intel improves yields and TSMC brings additional capacity online, but "ease" means going from crisis to merely tight.

For anyone planning inference infrastructure right now, the takeaway is uncomfortable: you can't just optimize for GPU cost per token and call it a day. The non-GPU components of your stack are the ones inflating, and they're the ones with the least pricing transparency and the fewest alternative suppliers.

The memory market is controlled by three companies. The server CPU market is controlled by two (and a half, if you count Ampere). There's no equivalent of the tier-two GPU cloud explosion happening for DRAM or Xeon chips. The competitive pressure that drove H100 prices down 34% simply doesn't exist for the rest of the rack.

Plan accordingly. And maybe lock in your Q3 DRAM contracts before Samsung sends the next price letter.