Nvidia comes out on top in first MLPerf inference benchmarks

nvidia mlperf benchmarks mlperf ai artificial intelligence neural networks ml machine learning

The first benchmark results from the MLPerf consortium have been released and Nvidia is a clear winner for inference performance.

For those unaware, inference takes a deep learning model and processes incoming data however it’s been trained to.

MLPerf is a consortium which aims to provide “fair and useful” standardised benchmarks for inference performance. MLPerf can be thought of as doing for inference what SPEC does for benchmarking CPUs and general system performance.

The consortium has released its first benchmarking results, a painstaking effort involving over 30 companies and over 200 engineers and practitioners. MLPerf’s first call for submissions led to over 600 measurements spanning 14 companies and 44 systems. 

However, for datacentre inference, only four of the processors are commercially-available:

  • Intel Xeon P9282
  • Habana Goya
  • Google TPUv3
  • Nvidia Turing

Nvidia wasted no time in boasting of its performance beating the three other processors across various neural networks in both server and offline scenarios:

The easiest direct comparisons are possible in the ImageNet ResNet-50 v1.6 offline scenario where the greatest number of major players and startups submitted results.

In that scenario, Nvidia once again boasted the best performance on a per-processor basis with its Titan RTX GPU. Despite the 2x Google Cloud TPU v3-8 submission using eight Intel Skylake processors, it had a similar performance to the SCAN 3XS DBP T496X2 Fluid which used four Titan RTX cards (65,431.40 vs 66,250.40 inputs/second).

Ian Buck, GM and VP of Accelerated Computing at NVIDIA, said:

“AI is at a tipping point as it moves swiftly from research to large-scale deployment for real applications.

AI inference is a tremendous computational challenge. Combining the industry’s most advanced programmable accelerator, the CUDA-X suite of AI algorithms and our deep expertise in AI computing, NVIDIA can help datacentres deploy their large and growing body of complex AI models.”

However, it’s worth noting that the Titan RTX doesn’t support ECC memory so – despite its sterling performance – this omission may prevent its use in some datacentres.

Another interesting takeaway when comparing the Cloud TPU results against Nvidia is the performance difference when moving from offline to server scenarios.

  • Google Cloud TPU v3 offline: 32,716.00
  • Google Cloud TPU v3 server: 16,014.29
  • Nvidia SCAN 3XS DBP T496X2 Fluid offline: 66,250.40
  • Nvidia SCAN 3XS DBP T496X2 Fluid server: 60,030.57

As you can see, the Cloud TPU system performance is slashed by over a half when used in a server scenario. The SCAN 3XS DBP T496X2 Fluid system performance only drops around 10 percent in comparison.

You can peruse MLPerf’s full benchmark results here.

Interested in hearing industry leaders discuss subjects like this? Attend the co-located 5G Expo, IoT Tech Expo, Blockchain Expo, AI & Big Data Expo, and Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London, and Amsterdam.

Click to comment

You must be logged in to post a comment Login

Leave a Reply

To Top

We are using cookies on our website

We use cookies to personalise content and ads, to provide social media features, and to analyse our traffic. Please confirm if you accept our tracking cookies. You are free to decline the tracking so you can continue to visit our website without any data sent to third-party services. All personal data can be deleted by visiting the Contact Us > Privacy Tools area of the website.