November 5, 2013 | After the excitement and exhaustion of a months-long field project, the last thing any scientist or funder wants is for the resulting data to be lost or locked away forever. NCAR has a handy antidote for that concern.
Not only does the center’s Earth Observing Laboratory (EOL) provide observing platforms, experienced technicians, logistics, and data capture, but it also assures the field work will pay off for years to come by maintaining one of the world’s largest archives of data from international field campaigns involving atmospheric and multidisciplinary science. The archives include plenty of online tools for access and research, maintained by a deeply experienced staff.
The EOL data trove currently includes more than 400 field projects and nearly 6,000 data sets, which comprise some 17 million files and more than 100 terabytes of data, all available at no charge to the research community. The oldest project is the Line Island Experiment (1967), which studied equatorial circulations in the central Pacific Ocean. Among the latest additions are data from two major projects this past spring, the Mesoscale Predictability Experiment (MPEX—see archive) and the Southeast Atmosphere Study (SAS—see archive).
“When most people have moved on to the next field project, we’re just getting our hands dirty,” says Steve Williams, who heads EOL’s Data Management Group within EOL’s Computing, Data, and Software Facility. This group of 11, which includes scientists, software engineers, and students, is a full-service unit, shepherding data from the point of collection to long-term stewardship.
The group’s archives can be accessed through the EOL Data Archives, which lists all projects since 1967, or via the NCAR-based Community Data Portal. Some archives are produced and maintained in collaboration with other entities, such as NCAR's Research Data Archive at NCAR's Computational & Information Systems Laboratory.
Even if a field project took place decades ago, its data can be invaluable to a current area of study. For example, the EOL archives include extensive collections on both the original Verification of the Origins of Rotation in Tornadoes Experiment (VORTEX), conducted in 1994–95, and its follow-up, VORTEX2, carried out in 2009–10. NCAR has noticed an increase in such data requests for legacy projects such as these, which Williams believes may be related to growing interest in climate change research.
Another example involves the Dynamics of the Madden-Julian Oscillation campaign (DYNAMO), which studied processes in and near the Indian Ocean in late 2011 and early 2012. At a workshop this spring, DYNAMO investigators explored how the project built on findings from the mammoth international TOGA COARE study (1992–93) and the even larger GATE project in 1974. The EOL archives and other data sources are helping to keep the earlier projects valuable, according to scientists.
“None of these projects are dead—they’re still a live concern,” says Robert Houze (University of Washington). Houze was a PI in DYNAMO and TOGA COARE as well as the 72-nation GATE project, which he terms “a tremendous, unprecedented effort.”
EOL’s page on GATE brings together several resources and distributed archives, including a CISL website with a few key datasets and a GATE page from the NCAR Archives that includes project newsletters, photographs, and correspondence. Unfortunately, much of GATE's digital data was not preserved, underscoring the importance of EOL’s efforts to archive datasets from current field campaigns.
Large, complex field projects can produce eye-popping volumes of data. The complex DYNAMO campaign resulted in at least 400 disparate datasets, including many from several ships and aircraft, with five terabytes of data—almost 5% of EOL’s entire digital archive.
Another recent project includes measurements taken during a two-year series of five one month-long missions spread through the annual cycle. HIPPO—the HIAPER Pole-to-Pole Observations study of greenhouse gases and aerosols—included the first measurements at fine vertical resolution of over 90 atmospheric species collected at latitudes extending nearly from pole to pole over the Pacific Ocean. The HIPPO data are spread among two portals, one maintained by the U.S. Department of Energy’s Carbon Dioxide Information Analysis Center (CDIAC) and the other by NCAR/EOL.
“For many years our team had dreamed of a dataset of this kind,” says HIPPO PI Steven Wofsy (Harvard University). “We are delighted that the measurements have now been made and are available to the whole scientific community.”
One thing that makes data management easier today than in years past is that most new pieces of data from even the most sprawling project are already in digital form when they reach the EOL team. Before the 1990s, investigators stored their data using an eclectic variety of formats and media, from fax printouts and hard-copy satellite photos to nine-track magnetic tapes. For historical campaigns, tracking down these pieces and putting them together is a challenge in itself.
“We found one tape in a garage in Edmonton, Alberta,” says Williams. It held data from the Severe Environmental Storm and Mesoscale Experiment (SESAME) project, a landmark study of severe storms across the Great Plains in 1979. “The tape had been stored there for many years in less-than-optimal conditions, with large temperature variations. Amazingly, we were able to get data from it.”
Once the legacy media are in hand, it’s not always a simple matter to determine what format the datasets are in and figure out how to extract them. “We may end up with files and not have the software to read them,” says Williams. That’s where accessible metadata—information about how the data are structured and stored—can make the difference between life and death for a particular thread of observations within a field project.
To revive archaic formats when needed, the EOL group draws on an array of older tape drives, some of them decades old. The group also works often with NCAR's Computational & Information Systems Laboratory, whose decades of experience in archiving data generated by supercomputer experiments includes support in retrieving data from older storage media and formats.
It’s never too soon before the start of a field project for a researcher to approach NCAR and start pondering how their hard-won data will be preserved. Most field projects now allow users and others to track progress and upload some forms of data in real time. “We can help scientists think about how to structure their data policy and their website,” says Williams. Often his group’s work starts about a year before a field campaign kicks off. In the case of projects funded by the National Science Foundation, for example, the group can help researchers coordinate data meetings, set up mailing lists, and meet the NSF’s requirement for developing a data management strategy and plan.
Among the keys to success in data archiving: making sure that all of the investigators in a project get their collected data promptly to a lead investigator or another designee from the project.
“Strong leadership makes a difference,” says Williams.
Although data stewardship might seem like a thankless job in the moment, no one knows how such data will be used in the future. Scientific discovery can arise from data that were originally collected for an entirely different purpose.
Williams says it’s imperative to diligently archive the data, metadata, detailed documentation, and any related software that will be required while it’s all fresh in a data provider’s mind. “In 20 years or so, when the PI or other project staff may no longer be around, we need the capability for the next generation of scientists to access these datasets and have enough information to intelligently understand and make use of them. That’s what data stewardship is all about."