Amazon announced its latest generation of general-purpose GPU instances (P3) the other day, almost exactly a year after the launch of its first general-purpose GPU offering (P2). While the CPU’s on both suites of instance types are similar (both Intel Broadwell Xeon’s), the GPU’s definitely improved. Note that the P2/P3 instance types are well suited for tasks that have heavy computation needs (Machine Learning, Computational Finance, etc) and that AWS does provide G3 and EG1 instances specifically for graphic intensive applications.
The P2’s sport NVIDIA GK210 GPU’s whereas the P3’s run NVIDIA Tesla V100’s. Without digging too deep into the GPU internals, the Tesla V100’s are a huge leap forward in design and specifically target the needs of those running computationally intensive machine learning operations. Tesla V100’s tout “Tensor cores” which increase the performance of floating point computations and the larger of the P3 instance types support NVIDIA’s “NVLINK”, which allow multiple GPU’s to share intermediate results at high speeds.
While the P3’s are more expensive than the P2’s, they fill in the large gaps in on-demand pricing that existed when just the P2’s were available. That said, if you’re running a ton of heavy GPU computation through EC2, you might find the P3’s that offer NVLink a better fit, and picking them up off the spot market might make a lot of sense (they’re quite expensive). Here’s what the pricing landscape looks like now, with the older generation in yellow and latest in green:
When the P2’s first came out, Iraklis Mathiopoulos had a great blog post where he ran Hashcat (a popular password “recovery” tool) with GPU support against the largest instance size available… the p2.16xlarge. Just a few days ago he repeated the test against the largest of the P3 instances, the p3.16xlarge. If you’ve ever played around with Hashcat on your local machine, you’ll quickly realize how insanely fast one p3.16xlarge can compute. Iraklis’ test on the p2.16xlarge cranked out 12,275.6 MH/s (million hashes per second) while the p3.16xlarge at 59,971.8 MH/s against SHA-256. The author’s late 2013 MBP clocks in at a whopping 121.7 MH/s. The p3.16xlarge instance type is about to get some heavy usage by AWS customers who are concerned with results rather than price.
Of course, the test above is elementary and doesn’t exactly show the benefits on the NVIDIA Tesla V100 vs the NVIDIA GK210 in regard to ML/AI and neural network operations. We’re currently testing different GPU’s in our Worker product and hope to have some benchmarks we can soon share based on real customer workloads in the ML/AI space. The performance metrics and graphs that Worker produces will give a great visual on model building/teaching, and we’re excited to share our recent work with our current ML/AI customers.
While most of our ML/AI customers are on-premise, we’ll soon be looking to demonstrate Iron’s integration with P2 and P3 instances for GPU compute in public forums. In the meantime, if you are considering on-premise or hybrid solutions for ML/AI tasks, or looking to integrate the power of GPU compute, reach out and we’d be happy to help find an optimal strategy based on your needs.