Named for the astronomer Vera Florence Cooper Rubin, the Rubin architecture consists of six separate chips designed to be used in concert. The Rubin GPU stands at the center, but the architecture also addresses growing bottlenecks in storage and interconnection with new improvements in the Bluefield and NVLink, systems respectively. The architecture also includes a new Vera CPU, designed for agentic reasoning.
Explaining the benefits of the new storage, Nvidia’s senior director of AI infrastructure solutions Dion Harris pointed to the growing cache-related memory demands of modern AI systems.
“As you start to enable new types of workflows, like agentic AI or long-term tasks, that puts a lot of stress and requirements on your KV cache,” Harris told reporters on a call, referring to a memory system used by AI models to condense inputs. “So we’ve introduced a new tier of storage that connects externally to the compute device, which allows you to scale your storage pool much more efficiently.”
As expected, the new architecture also represents a significant advance in speed and power efficiency. According to Nvidia’s tests, the Rubin architecture will operate three and a half times faster than the previous Blackwell architecture on model-training tasks and five times faster on inference tasks, reaching as high as 50 petaflops. The new platform will also support eight times more inference compute per watt.
I wonder if this will be in AI7 chips 🤔