Introduction to Supercomputing
What is a supercomputer?
The term supercomputer is quite vague and doesn’t adequately express what it truly is. Probably when asked people will mention weather forecasting as a use of a supercomputer, however the reality is they are used in a variety of different ways although mainly for research and simulation.
What is Viper?
Our supercomputer or more accurately named High Performance Computer (shortened to HPC) is made up of about 200 separate computers which are linked together by a very high speed network. The purpose of the network is to allow them to exchange data between them and give the appearance they are acting as one large computer. The operating system is Linux which runs on the whole of Viper. This is the system used by the majority of research systems, including all of the top 500 supercomputers in the world.
Here is an example of the interface to Viper
What can Viper do?
Using our example of weather prediction each separate computer (or node) will analyse a small area of UK and all the nodes together will analyse a large area at the same time. The nodes then pass their processed data between each other for the boundary conditions and then write their data down to the file system once completed. To view what research Viper has already achieved.
Computing Nodes
Compute Nodes
These make up the most of the computing nodes and perform most of the standard computing processes within the HPC. Each node has 128GByte of RAM.
High memory nodes
These are very similar to the compute nodes but they have much more memory, ours have a 1 Tera Byte of RAM each and make them ideal for research involving large memory models like engineering and DNA analysis in biology.
Accelerator nodes
These are again very similar to the compute nodes however they each have a Nvidia A40 card. These are almost similar to high end graphics cards found in gaming rigs. The usefulness of these cards is that they have thousands of very small processing cores in them and this makes them very useful for executing small amounts of code but in a massively parallel way. This is why these cards are used for the new areas of machine learning and deep learning also.
Visualisation Nodes
These are used for connecting from remote computers such as desktops and allow the rendered outputs from data to be viewed on a local computer. There are two visualisations nodes with 2x Nvidia GTX 980TI.
High Speed network
All these compute nodes are connected by a very fast Intel Omni-path network to allow the compute nodes to act together. This runs at 100Gbit/s.
Storage
Storage Nodes
These are servers in their own right which allow access to the actual storage arrays (hard disks), Viper accesses it’s disk storage via these nodes.
Storage Array
These are the actual disks held in a chassis and make up the whole file storage for Viper. Viper has a total file space of 0.5 Petabyte or 500 Terabyte.
Administration
Controller Nodes
The controller nodes (sometimes called head nodes) are responsible for the management of all the compute nodes. Managing the loading of jobs, their termination and completion via the job scheduler, for Viper this is SLURM.
Login Nodes
These are the nodes which allow the user to login to the cluster, this area is then used by the user to prepare jobs for the cluster. Although this is a server in it’s own right it is not used for compute work. There are also two Active Directory servers (or AD) which act as an interface between the University’s login/password system and Viper.