Supermicro has made waves in the AI computing sector by achieving industry-leading performance with its NVIDIA HGX B200 8-GPU systems in the global MLPerf Inference v5.0 benchmark. This remarkable feat showcases Supermicro's technological prowess, as it is the only vendor to have recorded top inference performance in both air-cooled and liquid-cooled systems, indicating the versatility and reliability of its products.
According to Dies&Anzi, Supermicro's exclusive distributor, the company reached an impressive processing speed of 129,000 tokens per second during the Mixtral 8x7B Inference and Mixture of Experts benchmarks. This achievement was made possible through the use of the SYS-421GE-NBRT-LCC and SYS-A21GE-NBRT models, each equipped with eight NVIDIA B200-SXM-180GB GPUs.
What sets the NVIDIA B200-based systems apart is their exceptional cooling efficiency. The new cooling plate and a 250kW cooling water distribution unit (CDU) significantly enhance the cooling capacity, increasing it by five times within the same 4U form factor compared to previous generations. This advancement is crucial, especially in high-performance computing environments where managing heat is essential for optimal performance.
In the realm of large model inference, the B200 system has demonstrated a processing speed improvement of up to three times compared to its predecessor, the H200 8-GPU system. During the inference of models like Llama2-70B and Llama3.1-405B, the system generated over 1,000 tokens per second, showcasing its capacity to handle complex computations with ease.
Dies&Anzi is actively considering the introduction of the B200 system to the domestic market. To facilitate this, the company is conducting technical verification (Proof of Concept, or PoC) in an on-premise environment. This initiative aims to provide potential customers with insights into the system's practical applicability, performance-to-cost ratio, and overall stability.
Seo Young-min, a professional at Dies&Anzi, expressed confidence in the B200 system, stating, "Supermicro's latest AI system exceeds expectations not only in performance but also in practical deployment perspectives." He emphasized that as Supermicro's domestic distributor, Dies&Anzi is committed to ensuring that customers can experience optimal performance in the best environment possible.
Supermicro's success in the MLPerf benchmark is not merely a numerical achievement; it also reflects the company's adherence to the stringent standards set by MLCommons, the organization behind the benchmark. These standards require that all results be reproducible and that the products are commercially viable. Supermicro's compliance with these regulations further solidifies its reputation for reliability and innovation in the AI infrastructure market.
The significance of Supermicro's achievements extends beyond the numbers. The company has demonstrated that it can deliver top-tier performance regardless of the cooling method employed, as evidenced by the stable performance of both its air-cooled 10U and liquid-cooled 4U systems. This adaptability is a critical factor in the rapidly evolving landscape of AI technology, where efficiency and performance are paramount.
As the demand for AI infrastructure grows, Supermicro's advancements in cooling technology and optimized system design position it as a leader in the market. The company's ability to provide high-efficiency solutions for advanced large language model (LLM) inference environments is particularly noteworthy, as it addresses the increasing complexity and resource requirements of modern AI applications.
In summary, Supermicro's NVIDIA HGX B200 8-GPU systems have set a new benchmark in AI inference performance, achieving unprecedented speeds and efficiency. With their commitment to technological excellence and customer support, Supermicro and Dies&Anzi are poised to make significant impacts in the AI computing landscape.