Expanding access to NCAR's digital assets

Unified open data system in the works

An NCAR/UCP/UCAR team of software engineers, scientists, and curators is working to make research results and other digital assets publicly accessible in a unified way. The digital assets include NCAR's vast store of knowledge about the atmospheric, solar, and Earth-related sciences in categories such as publications, models and tools, support software, images, and observations from satellites, airborne and ground-based radars, and other instruments.

The Data Stewardship Engineering Team (DSET) is an organization-wide initiative funded by the NCAR Director’s Office. Committee members represent each NCAR lab, the library, and UCP.

"We want to make our digital assets easily discoverable through one portal," said CISL's Steven Worley, chair of DSET. "We’re trying to scrub away all the barriers so the digital assets are easily available for new investigations, applications, and discoveries."

Currently NCAR’s digital assets are stored and managed in diverse ways. NCAR data managers have heard anecdotally and through internal surveys that finding research results, datasets, and other information can be a challenge for scientists, graduate students, government agencies, funding organizations, and other interested users.  "We’re trying to resolve that issue," Worley said.

NCAR DSET members
DSET members (left to right):  Steve Williams (EOL), Abby Jaye ( MMM), Dave Schneider (CGD), Steven Worley ( CISL), Matt Mayernik (Library), Ryan May (Unidata), Eric Nienhouse (CISL), Linda Cully (EOL), Don Kolinski (HAO), Louisa Emmons (ACOM), Olga Wilhelmi (RAL).  Not shown: Rebecca Centeno Elliot (HAO). (©UCAR. Photo by Brian Bevirt, NCAR.)


"Data stewardship and management and, in particular, the development of a more unified, cross-organizational approach to simplify and enhance user discovery is a priority for NCAR," said NCAR Director Jim Hurrell. "Not only will this benefit our collaborators in the broader research community, but we will gain many functional efficiencies across the organization, directly facilitating the work of our staff."

DSET members, who have been meeting for the past 15 months, have identified and assessed more than 100 digital assets deemed of community interest, totaling more than three petabytes of data.

The goal is to develop one operational system that enables interested users to efficiently search for data across labs and programs. A key element is metadata, which summarizes and identifies the assets being collected and stored, making them easier to find. Some asset types will also have a data object identifier, a serial code used to identify research publications and, increasingly, datasets as well.

"Just as we curate and make accessible our scholarship through the library catalog, the DSET work will make access to our scientific data more coherent and discoverable," said NCAR Library Director Mary Marlino. "The work of DSET in combination with OpenSky and UCARConnect will make our intellectual and educational resources comprehensively available to the broader community."

Open and discoverable

NCAR’s effort will support the White House and related agency directives to better manage and make publicly accessible the digital data produced during federally funded research projects. For example, research papers are to be made freely available to the public within a year of publication. "Our effort is well-aligned with federal mandates that go into effect this year," Worley said.

The challenge is to develop a system that does two things equally well. It must integrate NCAR's data repository systems without disturbing how those systems are currently managed, and create an archiving and access environment for data that need improved management.

The DSET group recently identified 18 features they think are necessary in a new system. Those include no-cost text searches for discovery, proper attribution of all assets, space for general-purpose storage, and the ability for scientists to self-archive their materials. The new data management and storage system will use the NCAR-Wyoming Supercomputing Center as its hub.

Worley characterized the effort as a multi-year program. "We are making good progress so far," he said, adding that the team is making headway in defining metadata standards and in matching up required features with open source and locally developed software to construct the data management system. "When these thrusts come together we will have a prototype system to build on."

