Helmholtz AI powers up: super-fast AI system installed at KIT

One of the most powerful AI computer systems in Europe now supports scientists in tackling the biggest research challenges


Artificial intelligence (AI) is indispensable today as a tool for cutting-edge research. Alongside the algorithms, specialized hardware is becoming an increasingly important factor for its successful application, no matter the research field. The Helmholtz Association has funded the deployment of the InfiniBand connected NVIDIA DGX A100 AI system at Karlsruhe Institute of Technology (KIT) to support the democratisation and wide application of AI methods driven by the Helmholtz AI platform. KIT has become the first location in Europe to put this system into operation. 

AI and machine learning (ML) are a key research topic at the Helmholtz Association; through Helmholtz AI, its research-oriented platform for applied AI, similarities between applications are identified and developed, driving the creation of new methods and tools.  

To further promote the application of these future technologies in cross-field research projects, computational capacity is a must. “AI requires one thing above all else – an extreme amount of computing power," says Martin Frank, Director at the Steinbuch Centre for Computing (SCC) and Professor at the Institute for Applied and Numerical Mathematics (IANM) of KIT, Helmholtz AI’s local unit for the research field energy. "For our researchers, access to accelerated computing systems is a decisive competitive factor today.”

Procurement for KIT’s upcoming “Hochleistungsrechner Karlsruhe” supercomputer (“HoreKa” for short) led to a collaboration between SCC and with NVIDIA, a leader in accelerated computing, to become the first location in Europe to deploy NVIDIA DGX A100 systems. The three DGX A100 systems, funded by the Helmholtz Association to be available for applications through Helmholtz AI, are high-performance servers containing eight NVIDIA A100 Tensor Core GPUs each. Such powerful systems will allow researchers “to train significantly larger neural networks in a much shorter time with even larger amounts of data", confirms Frank, a capacity from which several research projects at Helmholtz AI could already benefit from. 

Extended computing and network capabilities for Helmholtz AI

According to Markus Götz, head of the Helmholtz AI consultant team at KIT, having access to new AI systems “imply that you can for example build deeper and larger AI models with better predictive performance, have capabilities that were computationally expensive beforehand, such as Bayesian  approaches, or provide higher throughput, allowing us to make more predictions in the same amount of time". His team will take advantage of these capabilities to “perform an extensive hyperparameter search for our lowest common ancestor (LCA) matrix neural network reconstructing particle collision events in the Belle-2 experiment”. 

Nico Hoffmann, young investigator group leader at Helmholtz-Zentrum Dresden-Rossendorf, also has plans for the DGX A100 system: “We are currently working on PDE learning and neural solvers for complex models in plasma physics, geophysics (jointly with UFZ) and medicine (with University Hospital Dresden and TU Dresden) that scale efficiently to very large systems. Large clusters also enable us to learn neural-network based surrogate models of the physics involved in the particle accelerators of the future, i.e. Wakefield acceleration”, he adds.

The new AI systems at KIT could also be used to help fight the current pandemic, such as by accelerating drug discovery research, the detection of infection hotspots, predicting propagation patterns, or relieving medical personnel during the analysis of X-ray images. Corresponding AI research initiatives have already been started across the Helmholtz Association. ”Moreover, the A100 allows multiple users with smaller models to utilize the same hardware, so we can share our resources better”, adds Götz.

At the forefront of computing power in Europe

"AI and machine learning can dramatically accelerate scientific computations in the most significant areas of research, where the world’s problems are being solved," says Marc Hamilton, Vice President of Solutions Architecture and Engineering at NVIDIA. “Our new DGX A100 systems with Tensor Core GPUs and NVIDIA Mellanox HDR InfiniBand interconnects support this accelerated research and will speed up scientific discovery for a broad range of important research.”

Each DGX A100 provides five PetaFLOPS of AI computing power, i.e. five quadrillion computing operations per second – about five times faster than the earlier NVIDIA DGX system based on NVIDIA V100 GPUs. At the same time, the new accelerators have been equipped with significantly larger and faster main memory and NVSwitch to provide full NVLink bandwidth of 600 GB/s between any pair of GPUs. The DGX A100 systems are connected via the high-performance NVIDIA Mellanox HDR InfiniBand interconnect technology and leverage its in-network computing engines to deliver top performance.

The new NVIDIA DGX A100 systems will also allow researchers to optimize their applications for KIT's future HoreKa supercomputer. HoreKa will also use NVIDIA A100 accelerators, but 740 instead of just eight per DGX system. Once the full system is operational in summer 2021, HoreKa is expected to be one of the ten fastest supercomputers in Europe.



Photograph: The new DGX A100 computer systems are high-performance servers with eight NVIDIA A100 tensor core GPUs each. Together, the eight accelerators have a computing power of 5 AI-PetaFLOP / s (Simon Raffeiner / SCC)