Nvidia announced this week that they are partnering with EPRI, Prologis, and InfraPartners to study smaller-scale data centers. Five to twenty megawatts. Distributed. Close to where the data is created.
They are calling this a study. I am calling it an admission.
The Gigawatt Hangover
For the last two years, the datacenter industry has been drunk on scale. Every announcement was bigger than the last. A hundred megawatts here. Five hundred there. Nuclear reactors dedicated to training runs. The arms race was measured in square footage and transformer capacity, and everyone was winning because everyone was building.
Training demanded it. Training a frontier model is a monolithic task. You cannot split a trillion-parameter training run across forty locations connected by public internet. You need the cores in one room, on one fabric, with one clock. Scale was not vanity. It was physics.
But training is not the whole story. It never was.
The Inference Problem
Inference is a fundamentally different workload. It is embarrassingly parallel across requests. It is latency-sensitive. It does not need a campus. It needs proximity.
When someone asks a model a question, the answer needs to travel back to them. The speed of light has not changed. If the datacenter is in Iowa and the user is in Singapore, there is a floor on how fast that response arrives, and no amount of hardware optimization will fix it. Geography is the last bottleneck.
The hyperscalers know this. They have known it for years. Their CDN strategies already reflect it. But for AI inference, they have been pretending that centralized mega-campuses are the answer because that is what they were already building for training.
Nvidia just stopped pretending.
Five Megawatts Is Not Small
I want to be clear about something. Five megawatts is not a closet with a rack in it. Five megawatts powers roughly a thousand high-density GPU nodes. That is a serious facility. It is "small" only by the standards of an industry that has lost perspective.
But it is the right size for inference at the edge. It fits in existing commercial real estate. It connects to existing utility infrastructure without requiring a new substation. It can be permitted, built, and operational in months instead of years. It trades peak capacity for deployment speed.
There is a class of operator that has been building at this scale for a decade. They are not in the press releases. They do not have billion-dollar CapEx announcements. They have racks in secondary markets, serving real customers, running profitably on power that the hyperscalers considered too small to bother with.
The industry just validated their entire business model.
The Distribution Thesis
Here is what I think happens next. Training stays centralized. It has to. The physics demands it. But inference distributes. Aggressively. Every major metro gets a cluster. Every secondary market with cheap power and low latency becomes interesting.
The winners will not be the companies that can build the biggest building. They will be the ones that can deploy the most locations the fastest. Operational efficiency at each site. Standardized hardware. Automated provisioning. The logistics of distribution, not the engineering of concentration.
This is not a new idea. It is how the internet itself was built. DNS, CDN, edge caching - every major internet infrastructure story has been a story about moving compute closer to users. AI inference is just the latest chapter.
The five megawatt facility is not a compromise. It is the correct architecture for the workload. The only revelation is that it took the largest GPU company on earth this long to say it out loud.