Terrifying truth behind supercomputers

High performance computing (HPC) has been engaged in an arms race for decades since Seymour Cray created the CDC 6600, widely regarded as the first supercomputer in history. The goal was to improve performance in any way, at any expense. Since the CDC 6600 was introduced in 1964, the performance of top systems has multiplied one trillion times thanks to advancements in computation, storage, networking, and software, going from millions of floating point operations per second (megaFLOPS) to quintillions (exaFLOPS).

According to the High Performance Linpack (HPL) benchmark, the current champion, a massive US-based supercomputer by the name of Frontier, is capable of achieving 1.102 exaFLOPS. However, it’s believed that even more potent machinery is working somewhere else, behind closed doors. Exascale supercomputers are predicted to have a positive impact on almost every industry, including science, cybersecurity, healthcare, and finance. They will also pave the way for powerful new artificial intelligence models that would have otherwise taken years to develop. But such a significant improvement in speeds has a price: it uses more energy. When operating at full capacity, Frontier can use up to 40MW of electricity, which is equivalent to almost 40 million desktop PCs. The goal of supercomputing has always been to push the envelope of what is achievable. However, as the need to reduce emissions becomes increasingly evident and energy costs rise, the HPC business will need to reassess whether its initial guiding philosophy is still worthwhile adhering to.

Performance vs efficiency

The University of Cambridge, which has created numerous supercomputers with power efficiency at the heart of the design in collaboration with Dell Technologies, is one entity operating at the forefront of this subject. The Wilkes3, for example, is positioned only 100th in the overall performance charts, but sits in third place in the Green500, a ranking of HPC systems based on performance per watt of energy consumed. Dr. Paul Calleja, Director of Research Computing Services at the University of Cambridge, indicated that the organisation places a much greater emphasis on creating computers that are highly productive and efficient than exceedingly powerful.

“We’re not really interested in large systems, because they’re highly specific point solutions. But the technologies deployed inside them are much more widely applicable and will enable systems an order of magnitude slower to operate in a much more cost- and energy-efficient way. In doing so, you democratize access to computing for many more people. We’re interested in using technologies designed for those big epoch systems to create much more sustainable supercomputers, for a wider audience,” Dr. Paul Calleja said.

Dr. Paul Calleja also anticipates a fiercer drive for energy efficiency in the years to come in the HPC industry and wider data center community, where energy usage accounts for upwards of 90% of expenses. He says, “Performance per watt is crucial, as recent energy price variations linked to the conflict in Ukraine will have rendered running supercomputers significantly more expensive, especially in the context of exascale computing”.

The university discovered that Wilkes3 may benefit from a variety of changes that increased efficiency. For instance, the team was able to reduce energy usage by around 20–30% by reducing the clock speed at which specific components were operating, depending on the workload.

“Within a particular architectural family, clock speed has a linear relationship with performance, but a squared relationship with power consumption. That’s the killer. Reducing the clock speed reduces the power draw at a much faster rate than the performance, but also extends the time it takes to complete a job. So what we should be looking at isn’t power consumption during a run, but really energy consumed per job. There is a sweet spot”, Dr. Paul Calleja explained.

Software is king

In addition to fine-tuning hardware configurations for particular workloads, there are other optimizations that may be achieved in the context of networking and storage, as well as related fields like cooling and rack architecture. When it comes to spending money, Dr. Paul Calleja said, “Software should be the top priority. The hardware is not the problem, it’s about application efficiency. This is going to be the major bottleneck moving forward.”

“Today’s exascale systems are based on GPU architectures and the number of applications that can run efficiently at scale in GPU systems is small. To really take advantage of today’s technology, we need to put a lot of focus into application development. The development lifecycle stretches over decades; software used today was developed 20-30 years ago and it’s difficult when you’ve got such long-lived code that needs to be re-architected,” he added.

The problem, though, is that the HPC industry has not made a habit of thinking of software-first. Firstly, much more attention has been paid to the hardware, because, in Dr. Paul Calleja’s words, “It’s easy; you just buy a faster chip. You don’t have to think cleverly. While we had Moore’s Law, with a doubling of processor performance every eighteen months, you didn’t have to do anything (on a software level) to increase performance. But those days are gone. Now if we want advancements, we have to go back and rearchitect the software.”

In this context, Dr. Paul Calleja reserved some praise for Intel. Application compatibility has the potential to become a problem as the server hardware market grows more diverse from a vendor standpoint (in most aspects, a positive development), but Intel is working on a solution.

“One differentiation I see for Intel is that it invests an awful lot (of both funds and time) into the one API ecosystem, for developing code portability across silicon types. It’s these kinds of tool chains we need, to enable tomorrow’s applications to take advantage of emerging silicon,” he notes.

Separately, Dr. Paul Calleja demanded that “scientific need” be given more attention. A misalignment between hardware and software architectures and the real needs of the end user occurs far too frequently when things “go awry in translation.” He claims that a more enthusiastic approach to cross-industry collaboration would establish a “virtuous circle” made up of users, service providers, and vendors, which would result in gains in terms of both performance and efficiency.

A zettascale future

In typical fashion, with the fall of the symbolic exascale milestone, attention will now turn to the next one: zettascale. Dr. Paul Calleja said that Zettascale is just the next flag in the ground. A totem that highlights the technologies needed to reach the next milestone in computing advances, which today are unobtainable. Dr. Paul Calleja said, “The world’s fastest systems are extremely expensive for what you get out of them, in terms of the scientific output. But they are important, because they demonstrate the art of the possible and they move the industry forwards.”

The ability of the industry to innovate will determine whether systems that are one zettaFLOPS of performance, thousand times more potent than the current crop, can be developed in a way that is in line with sustainability goals. Performance and power efficiency may not always go hand in hand, but each sub-discipline will need a good helping of artistry to offer the required performance improvement within a reasonable power envelope. Theoretically, the advantages to society brought about by HPC can be considered to justify the expenditure on carbon emissions. This is known as the “golden ratio” of performance to energy consumption. Of course, in practice, the exact number will remain elusive, but the idea itself is by definition a step in the right direction.

Top seven supercomputers

Fugaku: Fugaku, a computing platform developed by Fujitsu, is situated at the RIKEN Center for Computational Science (R-CCS) in Kobe, Japan. With its added hardware, the system was able to surpass the second-place system on the list by a factor of three and set a new world record on HPL with a result of 442 petaflops. Satoshi Matsuoka, the director of RIKEN, said the advancement came from “finally being able to use the complete machine rather than just a good piece of it.” His team has been able to optimise the code for optimum performance since the competition in June. Satoshi Matsuoka stated, “I don’t think we can improve much more.”

Summit: Summit, which was developed by IBM, is based at the Oak Ridge National Laboratory (ORNL) in Tennessee. Summit is the fastest system in the US. It was introduced in 2018 and features 4,356 nodes, each of which houses two 22-core Power9 CPUs and six NVIDIA Tesla V100 GPUs. It has a performance of 148.8 petaflops. Recently, two Summit-related teams shared the coveted Gordon Bell Prize—often referred to as the “Nobel Prize of supercomputing”—for remarkable accomplishment in high-performance computing.

Sierra: Sierra, a supercomputer at the California-based Lawrence Livermore National Laboratory (LLNL), has an HPL score of 94.6 petaflops. Sierra has a similar design as Summit, with each of its 4,320 nodes sporting two Power9 CPUs and four NVIDIA Tesla V100 GPUs. On the Green500 List of the world’s most energy-efficient supercomputers, Sierra reached the fifteenth spot.

Sunway TaihuLight: Sunway TaihuLight, which is situated in China’s National Supercomputing Center in Wuxi, previously held the top position for two years (2016-2017). Its standing has since dropped, though. Last year, it held the third spot, but it has since dropped to fourth. It was created by China’s National Research Center of Parallel Computer Engineering & Technology (NRCPC), and on its HPL benchmark, it produced 93 petaflops. The only processors used are Sunway SW26010s.

Selene: Installed internally at NVIDIA Corp., Selene climbed from seventh to fifth place in the June rankings. Selene has received an update, increasing its HPL score from 27.6 petaflops to 63.4 petaflops. Selene, NVIDIA’s AI supercomputer, was introduced in June of this year after less than a month of construction and operation. Its primary applications include chip design work, internal AI workloads, and system development and testing.

Tianhe-2A: Tianhe-2A also known as MilkyWay-2A is deployed at the National Supercomputer Center in Guangzhou and was created by China’s National University of Defense Technology (NUDT). Tianhe-2A is powered by Matrix-2000 DSP accelerators from NUDT and Intel Xeon CPUs. Applications in modelling, analysis, and government security will be used with it. From June 2013 to November 2015, it occupied the top spot on the chart.

JUWELS Booster Module: The most recent addition to the list is the JUWELS Booster Module from Atos. With a peak performance of 44.1 HPL petaflops, the BullSequana machine is now the most powerful system in Europe and was just deployed at the Forschungszentrum Julich (FZJ) in Germany. Similar to the Selene system, JUWELS is powered by AMD CPUs and NVIDIA GPUs and is based on a modular system design.