November 5, 2013 | After the excitement and exhaustion of a months-long field project, the last thing any scientist or funder wants is for the resulting data to be lost or locked away forever. NCAR has a handy antidote for that concern.
Not only does the center’s Earth Observing Laboratory (EOL) provide observing platforms, experienced technicians, logistics, and data capture, but it also assures the field work will pay off for years to come by maintaining one of the world’s largest archives of data from international field campaigns involving atmospheric and multidisciplinary science. The archives include plenty of online tools for access and research, maintained by a deeply experienced staff.
The vast data holdings of NCAR’s Earth Observing Laboratory are managed by a team that includes Bob Rilling and Steve Williams. (©UCAR. Photo by Carlye Calvin. This image is freely available for media & nonprofit use.)
The EOL data trove currently includes more than 400 field projects and nearly 6,000 data sets, which comprise some 17 million files and more than 100 terabytes of data, all available at no charge to the research community. The oldest project is the Line Island Experiment (1967), which studied equatorial circulations in the central Pacific Ocean. Among the latest additions are data from two major projects this past spring, the Mesoscale Predictability Experiment (MPEX—see archive) and the Southeast Atmosphere Study (SAS—see archive).
“When most people have moved on to the next field project, we’re just getting our hands dirty,” says Steve Williams, who heads EOL’s Data Management Group within EOL’s Computing, Data, and Software Facility. This group of 11, which includes scientists, software engineers, and students, is a full-service unit, shepherding data from the point of collection to long-term stewardship.
The group’s archives can be accessed through the EOL Data Archives, which lists all projects since 1967, or via the NCAR-based Community Data Portal. Some archives are produced and maintained in collaboration with other entities, such as NCAR's Research Data Archive at NCAR's Computational & Information Systems Laboratory.
Mining past success
NCAR's S-Pol radar is silhouetted against brightly lit tropical cumulus clouds. This photo was taken during S-Pol's deployment in the Maldives for the DYNAMO (Dynamics of the Madden-Julian Oscillation) field campaign of 2011–12. The project was designed to help improve long-range weather forecasts and seasonal outlooks and to help scientists further refine computer models of global climate. DYNAMO is among more than 400 field projects with data archived at NCAR. (©UCAR. Photo by Michael Dixon, NCAR. This image is freely available for media & nonprofit use.)
Even if a field project took place decades ago, its data can be invaluable to a current area of study. For example, the EOL archives include extensive collections on both the original Verification of the Origins of Rotation in Tornadoes Experiment (VORTEX), conducted in 1994–95, and its follow-up, VORTEX2, carried out in 2009–10. NCAR has noticed an increase in such data requests for legacy projects such as these, which Williams believes may be related to growing interest in climate change research.
Another example involves the Dynamics of the Madden-Julian Oscillation campaign (DYNAMO), which studied processes in and near the Indian Ocean in late 2011 and early 2012. At a workshop this spring, DYNAMO investigators explored how the project built on findings from the mammoth international TOGA COARE study (1992–93) and the even larger GATE project in 1974. The EOL archives and other data sources are helping to keep the earlier projects valuable, according to scientists.
The NCAR GATE Group, represented in this file photo by Edward Zipser, William Lanterman, and Henry van de Boogaard, spent years planning the massive 1974 field experiment. Data analysis took more years and more people, including Margaret LeMone, Rebecca Meitin, Al Miller, William Pennell, Katsuyuki “Vic” Ooyama, and Herbert Riehl. (©UCAR. This image is freely available for media & nonprofit use.)
“None of these projects are dead—they’re still a live concern,” says Robert Houze (University of Washington). Houze was a PI in DYNAMO and TOGA COARE as well as the 72-nation GATE project, which he terms “a tremendous, unprecedented effort.”
EOL’s page on GATE brings together several resources and distributed archives, including a CISL website with a few key datasets and a GATE page from the NCAR Archives that includes project newsletters, photographs, and correspondence. Unfortunately, much of GATE's digital data was not preserved, underscoring the importance of EOL’s efforts to archive datasets from current field campaigns.
Large, complex field projects can produce eye-popping volumes of data. The complex DYNAMO campaign resulted in at least 400 disparate datasets, including many from several ships and aircraft, with five terabytes of data—almost 5% of EOL’s entire digital archive.
Another recent project includes measurements taken during a two-year series of five one month-long missions spread through the annual cycle. HIPPO—the HIAPER Pole-to-Pole Observations study of greenhouse gases and aerosols—included the first measurements at fine vertical resolution of over 90 atmospheric species collected at latitudes extending nearly from pole to pole over the Pacific Ocean. The HIPPO data are spread among two portals, one maintained by the U.S. Department of Energy’s Carbon Dioxide Information Analysis Center (CDIAC) and the other by NCAR/EOL.
“For many years our team had dreamed of a dataset of this kind,” says HIPPO PI Steven Wofsy (Harvard University). “We are delighted that the measurements have now been made and are available to the whole scientific community.”
What to do with a “bag of bytes”
One thing that makes data management easier today than in years past is that most new pieces of data from even the most sprawling project are already in digital form when they reach the EOL team. Before the 1990s, investigators stored their data using an eclectic variety of formats and media, from fax printouts and hard-copy satellite photos to nine-track magnetic tapes. For historical campaigns, tracking down these pieces and putting them together is a challenge in itself.
As a product of the pre-Internet age, the 1974 GARP Atlantic Tropical Experiment (GATE) lives on largely in the form of printouts, binders, photos, and magnetic tape. For more about efforts to preserve data from atmospheric soundings collected during field projects, see the article “Legacy Atmospheric Sounding Data Project” (PDF), which appeared in the January 2012 issue of the Bulletin of the American Meteorological Society. (Photo by Paul Ciesielski, Colorado State University.)
“We found one tape in a garage in Edmonton, Alberta,” says Williams. It held data from the Severe Environmental Storm and Mesoscale Experiment (SESAME) project, a landmark study of severe storms across the Great Plains in 1979. “The tape had been stored there for many years in less-than-optimal conditions, with large temperature variations. Amazingly, we were able to get data from it.”
Once the legacy media are in hand, it’s not always a simple matter to determine what format the datasets are in and figure out how to extract them. “We may end up with files and not have the software to read them,” says Williams. That’s where accessible metadata—information about how the data are structured and stored—can make the difference between life and death for a particular thread of observations within a field project.
To revive archaic formats when needed, the EOL group draws on an array of older tape drives, some of them decades old. The group also works often with NCAR's Computational & Information Systems Laboratory, whose decades of experience in archiving data generated by supercomputer experiments includes support in retrieving data from older storage media and formats.
Planning for archiving before there’s any data
It’s never too soon before the start of a field project for a researcher to approach NCAR and start pondering how their hard-won data will be preserved. Most field projects now allow users and others to track progress and upload some forms of data in real time. “We can help scientists think about how to structure their data policy and their website,” says Williams. Often his group’s work starts about a year before a field campaign kicks off. In the case of projects funded by the National Science Foundation, for example, the group can help researchers coordinate data meetings, set up mailing lists, and meet the NSF’s requirement for developing a data management strategy and plan.
Digital data have been stored in a wide variety of formats over the years. (©UCAR. Photo by Carlye Calvin. This image is freely available for media & nonprofit use.)
Among the keys to success in data archiving: making sure that all of the investigators in a project get their collected data promptly to a lead investigator or another designee from the project.
“Strong leadership makes a difference,” says Williams.
Although data stewardship might seem like a thankless job in the moment, no one knows how such data will be used in the future. Scientific discovery can arise from data that were originally collected for an entirely different purpose.
Williams says it’s imperative to diligently archive the data, metadata, detailed documentation, and any related software that will be required while it’s all fresh in a data provider’s mind. “In 20 years or so, when the PI or other project staff may no longer be around, we need the capability for the next generation of scientists to access these datasets and have enough information to intelligently understand and make use of them. That’s what data stewardship is all about."