Nvidia’s Latest AI Servers Boost Speed of Top Chinese Models by 10 Times

Speaksly News: Nvidia unveiled new performance benchmarks on Wednesday that show its cutting edge artificial intelligence servers can run some of the world’s most advanced language models up to ten times faster than the previous generation, including two highly regarded models developed in China.

The results highlight a major leap in the hardware needed to deliver AI services to millions of users at once, a phase known as inference. While Nvidia has long dominated the market for training massive AI systems, the inference stage is drawing intense competition from companies such as Advanced Micro Devices and Cerebras Systems.

The benchmarks focus on a popular architecture called mixture of experts. This design breaks complex queries into smaller pieces and routes each piece to specialized sections of the model, dramatically improving efficiency. The technique gained widespread attention earlier this year when Chinese startup DeepSeek released a powerful open source model that required far less computing power to train than many Western counterparts.

Since that breakthrough, leading developers have embraced the same approach. OpenAI, the company behind ChatGPT, French firm Mistral, and Beijing based Moonshot AI have all launched mixture of experts models. Moonshot’s Kimi K2 Thinking model, released as open source in July, quickly climbed independent leaderboards and earned praise for its performance.

Nvidia stressed that its newest server design, which packs seventy two of its most powerful chips into a single system linked by high speed connections, delivers the real world gains. Independent tests conducted by the company showed Moonshot’s Kimi K2 Thinking model running ten times faster on the new hardware compared to older Nvidia servers. The company reported similar improvements with DeepSeek models.

According to Nvidia, the dramatic speedup comes primarily from placing more chips in close proximity and connecting them with faster networking technology, advantages the company says it still holds over rivals.

Advanced Micro Devices has signaled plans to launch its own densely packed server packed with multiple high performance chips sometime next year, setting the stage for closer competition in this fast growing segment of the AI market.

For now, Nvidia is making a clear case that its latest infrastructure remains the fastest and most efficient way to serve the newest generation of AI models, even those built specifically to use fewer resources during training.

Leave a Comment Cancel reply