10-13 March 2025
Sands Expo and Convention Centre
Marina Bay Sands, Singapore

Location: Room P8 – Peony Jr 4412 (Level 4)

Abstract: This half-day tutorial introduces the concepts and building blocks behind campaign management. Campaign Management enables a group of scientists to manage many related datasets stored in multiple files, across multiple facilities as if it was in a single file / database.

We provide an organization concept for collecting metadata from all datasets in small files called Campaign Archives, which can be shared among project participants on their laptops, and which can easily facilitate discovery of content and pointers to the data location as well as remote access to the data by local tools as if data was local.

Remote data access to large HPC datasets is a daunting challenge, and downloading entire files is prohibitive, therefore, we discuss enabling technologies to download only the data values, to a user-defined accuracy, which are of interest. This includes:

a) Self-describing file formats (like ADIOS, HDF5) that enables collecting metadata to present the content in the campaign archive as well as facilitate fine-grained access to partial selections of single variables in a remote dataset;

b) Derived Variables, mathematical expressions to compute the derived variables, that contain only enough information for

c) Queries on Derived Variables that quickly filter the list of data blocks that need to be retrieved for satisfying the user’s interest; and

d) Data Reduction techniques that can save on network bandwidth and guarantee user-defined error bounds.

Our team is developing enabling technologies for campaign management, and initially integrating it with ADIOS which provides scalable file I/O and data streaming, and MGARD, which is an error-controlled compression and refactoring framework for scientific data. We will present applications from fusion and combustion, simulation and experimental data, to showcase the utility of campaign management.

We invite users and developers of large scale experimental and codes to learn how they can work with their large scale data on resources ranging from their laptops to exascale computers.

We also invite all interested researchers and developers to join our effort to enable this technology across a wide range of data technologies to create a comprehensive solution for bringing data to the computation.