View this email in your browser
Issue #5  / Storage

Welcome to the fifth HPC Best Practices newsletter, Storage - File Systems & Archives

Best practices are an important means of learning through and from others. In the supercomputing community, this is equally important. Twice a year, before the two major global supercomputing conferences – SC and ISC – a best practices newsletter will be issued with a focused topic with input from experts around the world.

This edition features the Storage - File Systems & Archives best practices from three supercomputing centres –  the National Computational Infrastructure (NCI), EPCC and the Pawsey Supercomputing Centre.

We hope you have been enjoying this newsletter series, we'd really appreciate your feedback on it, please take a moment to complete this 45 second survey.

If you believe you can contribute to future newsletters, contact the Pawsey Communications team to get involved!

In this issue

High-Performance Filesystems at NCI – Performance Enabling Science

NCI Australia operates five high-performance Lustre filesystems, storing a total of around 55 Petabytes of data for Australian research. These filesystems, running at up to 150 GB/second, have a direct link into NCI’s new Gadi supercomputer, enabling some of the most exciting, data-intensive workflows to take place. Read more.

Back to Index

About NCI

NCI Australia is Australia’s leading high-performance data, storage and supercomputing organisation, providing expert services to benefit all domains of science, government and industry.

High Performance Compute and Storage Service Monitoring with CheckMK

Monitoring of a range of High Performance Compute and Storage (HPCS) systems can be time consuming and inefficient. Many HPCS systems are provided with their own monitoring solutions, anticipating that administrators will monitor each system separately.

Further, HPCS systems typically utilise boutique technology, which is commonly not monitored by existing solutions. EPCC has adopted “CheckMK” as a monitoring system, empowering administrators to reliably monitor all relevant systems through a “single pane-of-glass” approach. Here we give a summary of our approach, presenting further details at this year’s HPCSYSPRO workshop at SC19. Read more.

Back to Index

About EPCC

EPCC aims to accelerate the effective exploitation of novel computing throughout industry, academia and commerce, through providing a range of activities spanning: training programmes, service provision, industrial, academic and contract work.

Scalable Filesystems for Growing HPC Needs

Supercomputing isn’t just about raw processing power – in many ways data management and storage is even more important. 

Calculations can always be reworked, but data is a valuable asset that cannot always be replaced. 

At the Pawsey Supercomputing Centre this is particularly true, as the Centre is unique in being a real-time data repository for two operational radio telescopes. Read more.

Back to Index

About Pawsey

The Pawsey Supercomputing Centre enables over 80 organisations, and more than 1300 researchers in Australia to achieve unprecedented scientific results in domains such as radio astronomy, energy and resources, engineering, bioinformatics and health sciences. 

Full stories

High-Performance Filesystems at NCI – Performance Enabling Science

The filesystems in operation at NCI Australia, running alongside the newly constructed Gadi supercomputer, have been growing in scale and number ever since the first one was installed in 2013. NCI now runs five high-performance Lustre filesystems, all together storing around 55 Petabytes of research data for Australia. The growth in data use for Australian researchers parallels the growing demand for compute cycles: climate and weather models, earth-observation imagery, human genomes and astronomical data have all grown rapidly over the past six years. Now more than ever, these scientifically invaluable sources of data need to be made available to our users at NCI with the highest possible performance and reliability. Our five filesystems provide petascale storage capabilities alongside a petascale supercomputing cluster, enabling major data-intensive workflows to run effectively on hundreds of terabytes of data or more.

Significant upgrades to NCI’s filesystems in 2019 included the replacement of our oldest global filesystem with a high-performance NetApp filesystem of around 12 Petabytes in size. This increase in storage capacity is required to meet the growth in demand for data and data products coming out of almost every field of computational science. The new /g/data4 filesystem marks the latest major expansion to our storage capacity, following the May 2017 replacement of our original /g/data1 with two other new filesystems. With over 15,000 hard drives in operation at NCI, managing and maintaining them all is a complex challenge. Nevertheless, NCI’s users enjoy a highly available and incredibly high-performance storage environment suited to their varied and evolving research needs. Further archival storage at NCI is provided by an ever-growing Spectra library, itself containing around 40 Petabytes of valuable and irreplaceable historical research data.

As earth-observation data, genomic data, climate modelling data and other geophysical datasets grow, the connectedness of those datasets to each other becomes even more important. To enable the most exciting future innovations in data-intensive science, from geophysics to atmospheric modelling, NCI provides robust filesystems that underpin much of Australian science.

Back to Index
NCI Website
NCI Twitter
NCI YouTube
NCI LinkedIn

High Performance Compute and Storage Service monitoring with CheckMK

EPCC’s system administrators spend considerable amounts of time tracking the state of the various HPC, Research Computing and Storage systems that EPCC manages. Up until recently, problem detection and diagnosis has required monitoring multiple locations. This has been time consuming and difficult. Further, this has required a constant broad awareness that has made it difficult to analyse problems on new systems, where team members are typically under pressure to get the system up and running in a short time frame.
To alleviate these difficulties, we sought a “single pane of glass” approach using an on-site CheckMK server. CheckMK is a Nagios derivative monitoring system with many built in checks around such items as CPU, Memory, Filesystem and Interface status. CheckMK is easily extendable – we have created in-house checks to manage diverse systems and issues including DDN controllers, GPFS disk status, software vulnerability, compute node status and power consumption as well as Lustre servers and LNET statistics.
EPCC’s CheckMK server, Panopticon, is provided with access to the monitoring and management VLANS that are assigned to the various services on site; presently Panopticon has 18 virtual interfaces for monitoring our systems and services. Team members on shift to monitor these systems and services use Panopticon during working hours, but also out of hours if necessary. Alerts are viewed via a web interface with optional email alerts. On-shift staff have used Panopticon to identify numerous issues including partial power failures, system, disk and component failures and networking issues. It has further become a vital “first port of call” for investigating issues reported by users and other team members.
CheckMK and Panopticon have been particularly useful for monitoring our storage infrastructure, such as the Research Data Facility (RDF) – a 23PB file store with a 50PB backup tape library. Panopticon has been applied to the RDF in several ways:
  • A script adapted from example code running on Panopticon interrogates our DDN controllers for hardware and disk pool status using the DDN API;
  • The IBM tape library is monitored via SNMP;
  • GPFS NSDs are monitored using the standard CheckMK agent, including the path status of each multipath disk;
  • The status of the LUNs for each individual GPFS file system is monitored with a check script developed at EPCC.
CheckMK has become a central pillar of our approach to system management. We are excited to present what we have learned at this year’s HPCSYSPRO workshop at SC19..
EPCC Website
EPCC Twitter

Scalable Filesystems for Growing HPC Needs

Supercomputing isn’t just about raw processing power – in many ways data management and storage is even more important.  Calculations can always be reworked, but data is a valuable asset that cannot always be replaced.  At the Pawsey Supercomputing Centre this is particularly true, as the Centre is unique in being a real-time data repository for two operational radio telescopes.
Filesystems that protect the integrity of researchers’ data are critical to Pawsey’s operations, and since 2008 Pawsey has relied on Lustre.  At the time, it was one of the only systems designed for high performance computing, being both scalable and able to handle thousands of users.  Lustre splits the object data (the raw information) from the metadata (file location and access history, permissions and the like), making it simple to set up multiple object data stores and scale performance as the amount of data grows by adding additional hardware, even while the filesystem is online.
Three Lustre filesystems have recently been in operation at Pawsey.  Scratch is 3 PB of temporary storage that reads/writes at 70 GB/s, provided by Cray.  There is no quota on Scratch, so researchers can use as much storage as they need, but files are purged after 30 days of inactivity.
Pawsey’s Group filesystem is for mid-term storage so researchers can share their data, computational results and software among their project teams.  This Dell system is 3 PB of storage that reads/writes at 30 GB/s, but quotas are allocated as the resource is finite.
Pawsey’s Galaxy supercomputer, a Cray XC30, is dedicated to supporting two radio telescopes, the Murchison Widefield Array (MWA) and CSIRO’s Australian Square Kilometre Array Pathfinder (ASKAP).  Originally it had 1.9 PB of high-speed storage to allow both data ingress and data processing.
As part of Pawsey’s $70 million capital refresh project in 2019, the existing Astronomy filesystem was upgraded to 2.7 PB, capable of reading/writing at 30 GB/s.  This system is now dedicated solely to the MWA.
A new Lustre filesystem has recently been procured solely for ASKAP, called Buffer.  Manufactured by Dell, it provides 3.7 PB and reads/writes at 40 GB/s.  It accepts streamed data direct from ASKAP’s 36 antennas through 16 ingest nodes at Pawsey, and pre-processes a data product for longer-term storage.
Both the Astronomy filesystem expansion and the Buffer filesystem were easily attached to the existing systems, connected through Pawsey’s high-speed InfiniBand fabric.  The scale and performance of all of Pawsey’s Lustre filesystems can be easily grown in future just by adding additional racks of hardware.
Looking to the future the bottleneck becomes the metadata, rather than the object data, as finding the location for an object data file in a directory system containing four billion files becomes almost unmanageable.  Pawsey is now investigating alternative filesystems that can run complementary to Lustre, to manage the metadata bottleneck.  One option which may help manage the deluge of data from radio astronomy is an object store, where data is stored in large unconnected buckets.  Another option that may be more applicable for batch jobs is BeeGFS, where transient filesystems are created as needed for specific jobs on extremely fast solid-state storage systems, and then destroyed as the finished projects are moved to slower but higher capacity storage.
One filesystem will never fit all users.  The key is to provide several complementary filesystems that are flexible and easily scalable, connected through the same high-speed fabric so researchers can move data from system to system seamlessly. 


Data Management

User Management

Outreach & Training

Back to Index

Copyright © 2019 Pawsey Supercomputing Centre, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.