Supercomputing isn’t just about raw processing power – in many ways data management and storage is even more important. Calculations can always be reworked, but data is a valuable asset that cannot always be replaced. At the Pawsey Supercomputing Centre
this is particularly true, as the Centre is unique in being a real-time data repository for two operational radio telescopes.
Filesystems that protect the integrity of researchers’ data are critical to Pawsey’s operations, and since 2008 Pawsey has relied on Lustre
. At the time, it was one of the only systems designed for high performance computing, being both scalable and able to handle thousands of users. Lustre splits the object data (the raw information) from the metadata (file location and access history, permissions and the like), making it simple to set up multiple object data stores and scale performance as the amount of data grows by adding additional hardware, even while the filesystem is online.
Three Lustre filesystems have recently been in operation at Pawsey. Scratch is 3 PB of temporary storage that reads/writes at 70 GB/s, provided by Cray. There is no quota on Scratch, so researchers can use as much storage as they need, but files are purged after 30 days of inactivity.
Pawsey’s Group filesystem is for mid-term storage so researchers can share their data, computational results and software among their project teams. This Dell system is 3 PB of storage that reads/writes at 30 GB/s, but quotas are allocated as the resource is finite.
Pawsey’s Galaxy supercomputer
, a Cray XC30, is dedicated to supporting two radio telescopes, the Murchison Widefield Array
(MWA) and CSIRO’s Australian Square Kilometre Array Pathfinder
(ASKAP). Originally it had 1.9 PB of high-speed storage to allow both data ingress and data processing.
As part of Pawsey’s $70 million capital refresh project in 2019, the existing Astronomy filesystem was upgraded to 2.7 PB, capable of reading/writing at 30 GB/s. This system is now dedicated solely to the MWA.
A new Lustre filesystem has recently been procured solely for ASKAP, called Buffer. Manufactured by Dell, it provides 3.7 PB and reads/writes at 40 GB/s. It accepts streamed data direct from ASKAP’s 36 antennas through 16 ingest nodes at Pawsey, and pre-processes a data product for longer-term storage.
Both the Astronomy filesystem expansion and the Buffer filesystem were easily attached to the existing systems, connected through Pawsey’s high-speed InfiniBand fabric. The scale and performance of all of Pawsey’s Lustre filesystems can be easily grown in future just by adding additional racks of hardware.
Looking to the future the bottleneck becomes the metadata, rather than the object data, as finding the location for an object data file in a directory system containing four billion files becomes almost unmanageable. Pawsey is now investigating alternative filesystems that can run complementary to Lustre, to manage the metadata bottleneck. One option which may help manage the deluge of data from radio astronomy is an object store, where data is stored in large unconnected buckets. Another option that may be more applicable for batch jobs is BeeGFS, where transient filesystems are created as needed for specific jobs on extremely fast solid-state storage systems, and then destroyed as the finished projects are moved to slower but higher capacity storage.
One filesystem will never fit all users. The key is to provide several complementary filesystems that are flexible and easily scalable, connected through the same high-speed fabric so researchers can move data from system to system seamlessly.