At GTC Spring 2022, Nvida will present its next GPU architecture, Hopper. It will initially be used for AI systems, where Nvidia has established itself as an all-round provider: with GPU, ARM CPU, network technology and suitable software and services.
Providers on the subject
Hopper is the name of Nvidia's new GPU architecture. It starts with the H100 chip intended for AI workloads with 80 billion transistors, here on an SXM5 module. In addition, the manufacturer is presenting a complete system for Omniverse and a super-fast switch for AI systems.
Hopper follows Ampere: Nvidia named the next GPU architecture after Grace Hopper, who developed the first compiler in 1952 and thus decisively advanced computer science. As the first chip based on the Hopper architecture, Nvidia presents the H100 Tensor Core GPU, which is primarily intended for AI workloads such as AI training and inference, data analysis and high-performance computing. The chip consists of around 80 billion transistors, almost twice as many as in the A100, and is produced at TSMC using the 4N process. According to the manufacturer, it is the most advanced chip currently being built. “Data centers are becoming AI factories. They process and refine mountains of data to produce intelligence,” said Jensen Huang, CEO of Nvidia. "Nvidia's H100 is the engine for the global AI infrastructure that companies are using to power their AI-driven businesses." In addition to the hardware, Nvidia also provides customized SDKs and tools for a wide variety of AI workloads. With the AI Enterprise 2.0 software suite, companies can use their AI applications as containers under Red Hat OpenShift and using VM under vSphere on virtual machines.
H100: More power for deep learning
The H100 is the ninth generation of Nvidia's data center GPUs. Compared to the predecessor A100, the manufacturer promises a significant increase in performance. When fully expanded, the chip consists of eight GPU processing clusters with a total of 144 streaming processors and 18,432 FP32 CUDA cores. There are also a total of 576 fourth-generation Tensor cores and 60 MB of L2 cache. For a faster connection between multiple GPUs, Nvidia is now using a new generation of NVLink with a bandwidth of 900 MB/s, and PCI Express 5.0 is used for the connection to the CPU.
In deep learning, for example for natural language processing, a transformer-based model is increasingly being used, for example the Megatron-Turing NLG 530B with 530 billion parameters, which was developed jointly with Microsoft. Other areas of application for transformers are medicine with protein sequencing or machine vision. A new Transformer Engine should ensure that the H100 can handle the training of these massively larger models with 6 times the performance without losing accuracy. For dynamic programming (DXP), which is used in medicine, route optimization or quantum computer simulation, the GPU offers a set of its own DXP instructions. With them, corresponding tasks can be completed seven times faster than with an A100 GPU.
An H100 GPU can be partitioned into up to seven isolated partitions using second-generation multi-instance GPUs. Confidential Computing ensures the security of data and applications through encrypted transfers between the Nvidia driver in a protected VM on the CPU and the H100 GPU.
From the PCIe card to the supercomputer
Not all streaming processors are active in the H100 GPUs that are actually offered. The H100 uses only 132 stream processors with 16,896 Cuda cores and on the upcoming PCIe Gen5 cards on the SXM5 modules used on the HGX H100 server board and in the DGX H100 server only 114 stream processors with a total of 14,592 Cuda cores.
In addition to the GPU, the SMX5 boards designed for a 700 watt TDP also have 80 GB of HBM3 in five stacks, which are connected via a total of ten 512-bit memory controllers. Nvidia specifies the memory bandwidth as more than 3 TB/s. The PCI Express cards are equipped with 80 GB of slower HBM2e memory. They are satisfied with a TDP of 350 watts. In addition to a conventional H100 PCIe card, the manufacturer will also be launching the H100 CNX Converged Accelerator. In addition to the GPU, there is also a ConnectX-7 SmartNIC on the card, connected by an integrated PCIe Gen5 switch. According to Nvidia, the card can be used in servers without PCI Express 5.0 without causing a bottleneck in data transmission between the GPU and the network.
The HGX H100 server boards come in versions with four or eight H100 GPUs connected via NVLink and NVSwitch. They serve OEM partners as a basis for their own H100 systems. The company's own DGX-H100 server is the first H100 product to be launched. It is equipped with eight H100 GPUs with a total of 640 GB HBM3 and two x86 CPUs, for which Nvidia does not give any details. The main memory is two TB, the NVMe SSDs offer a total of 30 TB of storage space. Overall AI performance with the new FP8 accuracy is said to be 32 petaflops. For the network connection, the manufacturer uses eight of its own ConnectX 7 adapters and two Bluefield DPUs with 400 Gbit/s on InfiniBand or Ethernet.
For the highest demands on computing power, up to 32 DGX-H100 nodes can be connected to a next-generation DGX SuperPod supercomputer via an external NVLink switch.
At the moment, the DGX-H100 systems are still running with x86 processors. In the future, however, the CPUs will also come from Nvidia itself. The launch of the Grace CPU superchip with 144 ARM v9 cores and LPDDR5x with ECC as memory is planned for 2023. The chip consists of two of the Grace CPUs announced last year, which are connected via NVLink-C2C. With the Grace Hopper superchip, a combination of Grace CPU and GPU is also scheduled to appear in the first half of 2023 as a SoC for large-scale AI applications.
Nvidia OVX and Omniverse
The OVX servers consist of eight A40 GPUs, two Intel Xeon Platinum 8362 CPUs and three ConnectX 6 network cards. Nvidia does not build the systems itself, but leaves it to certified partners. Inspur, Lenovo and Supermicro, who will be launching the first systems in the coming months, will be the first to do so.
The appropriate network infrastructure for OVX and AI systems based on H100 should form the 400 Gbps Ethernet platform Spectrum-4. It consists of ConnectX 7 cards, Bluefield 3 DPUs and the new SN5000 switch family with Spectrum 4 ASIC and 400G and 800G ports.
With Omniverse Cloud, Nvidia announces a suite of cloud services that make it easy for designers, artists, and developers to collaborate on projects. Users without strong computers with RTX graphics can even use the Geforce Now platform in the cloud to view complex models. The Omniverse Cloud is currently still under development and only available as Early Access on request.
Additional Ampere GPUs
Since the Hopper architecture currently only includes high-end GPUs for AI and HPC, Nvidia is once again launching a series of Ampere-based GPUs for tasks such as 3D graphics or less demanding AI and analytics workloads largely correspond to the recently presented Geforce RTX models.
For desktop workstations, the only GTC novelty is the RTX A5500. With 10,240 Cuda, 80 RT and 320 Gen3 tensor cores and 24 GB of graphics memory, the card largely corresponds to the Geforce RTX 3080 Ti. However, it is not equipped with GDDR6X, but with GDDR6 chips, these with ECC. It also supports NVLink.
The RTX A5500 for mobile workstations is a good deal smaller than its gaming sister model with 7,424 Cuda cores and 58 RT and 232 Tensor cores. The graphics memory is 16 GB in size. Among them are further innovations and partial replacement of existing GPUs with models RTX A4500, RTX A3000 with 12 GB ECC-GDDR6, RTX A2000 with 8 GB GDDR6, RTX A1000 and RTX A500. The latter two GPUs are almost identical with 2,048 Cuda cores and 4 GB of graphics memory. However, the A500 is a bit more economical and slightly slower with a maximum of 60 instead of 95 watts TGP. The desktop version of the RTX A5500 is available now, notebooks with the new mobile GPUs are expected to be launched in the spring. Nvidia names Acer, Asus, Dell, HP, Lenovo and MSI as OEM partners.