Information for Schools – Viper – High Performance Computing

Introduction to Supercomputing

What is a supercomputer?

The term supercomputer is quite vague and doesn’t adequately express what it truly is. Probably when asked people will mention weather forecasting as a use of a supercomputer, however the reality is they are used in a variety of different ways although mainly for research and simulation.

What is Viper?

Our supercomputer or more accurately named High Performance Computer (shortened to HPC) is made up of about 200 separate computers which are linked together by a very high speed network. The purpose of the network is to allow them to exchange data between them and give the appearance they are acting as one large computer. The operating system is Linux which runs on the whole of Viper. This is the system used by the majority of research systems, including all of the top 500 supercomputers in the world.

Here is an example of the interface to Viper

What can Viper do?

Using our example of weather prediction each separate computer (or node) will analyse a small area of UK and all the nodes together will analyse a large area at the same time. The nodes then pass their processed data between each other for the boundary conditions and then write their data down to the file system once completed. To view what research Viper has already achieved.

Computing Nodes

Compute Nodes

These make up the most of the computing nodes and perform most of the standard computing processes within the HPC. Each node has 128GByte of RAM.

High memory nodes

These are very similar to the compute nodes but they have much more memory, ours have a 1 Tera Byte of RAM each and make them ideal for research involving large memory models like engineering and DNA analysis in biology.

Accelerator nodes

These are again very similar to the compute nodes however they each have a Nvidia A40 card. These are almost similar to high end graphics cards found in gaming rigs. The usefulness of these cards is that they have thousands of very small processing cores in them and this makes them very useful for executing small amounts of code but in a massively parallel way. This is why these cards are used for the new areas of machine learning and deep learning also.

Visualisation Nodes

These are used for connecting from remote computers such as desktops and allow the rendered outputs from data to be viewed on a local computer. There are two visualisations nodes with 2x Nvidia GTX 980TI.

High Speed network

All these compute nodes are connected by a very fast Intel Omni-path network to allow the compute nodes to act together. This runs at 100Gbit/s.

Storage

Storage Nodes

These are servers in their own right which allow access to the actual storage arrays (hard disks), Viper accesses it’s disk storage via these nodes.

Storage Array

These are the actual disks held in a chassis and make up the whole file storage for Viper. Viper has a total file space of 0.5 Petabyte or 500 Terabyte.

Administration

Controller Nodes

The controller nodes (sometimes called head nodes) are responsible for the management of all the compute nodes. Managing the loading of jobs, their termination and completion via the job scheduler, for Viper this is SLURM.

Login Nodes

These are the nodes which allow the user to login to the cluster, this area is then used by the user to prepare jobs for the cluster. Although this is a server in it’s own right it is not used for compute work. There are also two Active Directory servers (or AD) which act as an interface between the University’s login/password system and Viper.