Data & Data Analysis

Forecast for big data: Mostly cloudy

May 31, 2016 | The rise of big data has big implications for the advancement of science. It also has big implications for the clogging of bandwidth. The growing deluge of geoscience data is in danger of maxing out the existing capacity to deliver that information to researchers. In response, scientific institutions are experimenting with storing data in the cloud, where researchers can readily get the relatively small portion of the data they actually need. Helping blaze the way is Unidata, which partnered with Amazon Web Services last year to make Next Generation Weather Radar (NEXRAD) data from the National Oceanic and Atmospheric Administration (NOAA) available in the cloud in near real time. The project is one of the ways Unidata, a community program of the University Corporation for Atmospheric Research (UCAR), is exploring what the future of data access may look like. "One of the roles we play at Unidata is to see where the information technology world is going and monitor the new technologies that can advance science," said Unidata Director Mohan Ramamurthy. "In the last 10 years, we've watched the cloud computing environment mature. It's become robust and reliable enough that it now makes sense for the scientific community to begin to adopt it." Inside an Amazon Web Services data center. (Photo courtesy Amazon.) The data deluge Since 1984, Unidata has been delivering geoscience data in near real time to researchers who want it. Today, Unidata also offers those scientists tools they can use to analyze and visualize the data. In 2008, Unidata's servers delivered 2.7 terabytes of data a day to 170 institutions. Just five years later, the program was providing 13 terabytes—or the equivalent of about 4.5 million digital photos—a day to 263 institutions. Today, Unidata is delivering about 33 terabytes of data a day. And the volume is only expected to grow.  For example, NOAA's new weather satellite, GOES-R (Geostationary Operational Environmental Satellite R-Series), is scheduled to launch in October. When GOES-R is up and running, it alone will produce a whopping 3.5 terabytes of data a day. "We've been pushing out data for 30-plus years here at Unidata," said Jeff Weber, who is heading up Unidata's collaboration with Amazon. "What we're finding now is that the volume of available data is just getting to be too large," We can't keep putting more and more data into the pipe and pushing it out—there are physical constraints." The physical constraints are not just on Unidata's side. Many universities and other institutions that rely on Unidata do not have the local bandwidth to handle a huge increase in the incoming stream of data. To address the problem, Unidata decided a few years ago to begin transitioning its services to the cloud—a network of servers hosted on the Internet that allow you to access and process data from anywhere. The vision is to create a future where scientists could go to the cloud, access the data they need, and then use cloud-based tools to process and analyze that data. At the end of their projects, scientists would download only their finished products: a map or graph, perhaps, or the results from a statistical analysis. "With cloud computing, you can bring all your science and the analytic tools you use to the data, rather than the old paradigm of bringing the data to your tools," Ramamurthy said.  'Navigating the waters' These advantages were part of the motivation behind the U.S. Department of Commerce's announcement last spring that NOAA would collaborate with Amazon, Google, IBM, Microsoft, and the Open Commons Consortium with the goal of "unleashing its vast resources of environmental data" using cloud computing. A NEXRAD data product available to researchers through Unidata. (Image courtesy Unidata.) Amazon Web Services was one of the first out of the gate on the NOAA Big Data Project, uploading the full archive of NEXRAD data to the cloud last summer. But to figure out how to continue to feed the archive with near real time observations and to help make sense of the data — how people might want to use it and what kinds of tools they would need — Amazon turned to Unidata. "It made a lot of sense for Unidata to partner with Amazon and vice versa," Ramamurthy said. "They wanted expertise in atmospheric science data. We wanted an opportunity to introduce cloud-based data services to our community and raise awareness about what it can do." The scientific community is perhaps more hesitant to rely on the cloud than other user groups. Datasets are the lifeblood of many research projects, and knowing that the data are stored locally offers a sense of security for many scientists, Ramamurthy said. Losing access to some data could nullify years of work. But the truth is that the data are likely more secure in the cloud than on a local hard drive, Ramamurthy said. "Mirroring" by multiple cloud servers means that data are always backed up. If the Amazon project, and the NOAA Big Data Project in general, are successful in winning scientists over, it could go a long way toward helping Unidata make its own transition to the cloud. Unidata will be studying and learning from the project – including how to make a business model that will work -- with an eye toward its own future. "We're navigating the waters to find out what works and what doesn't so we can report back to the National Science Foundation," Weber said. "We want to see how this paradigm shift might play out — if it makes sense, if it doesn't, or if it makes sense in a few ways but not others." Writer/contactLaura Snider, Senior Science Writer and Public Information Officer

UCAR to support EarthCube: Cyberinfrastructure will advance science

BOULDER – EarthCube, a landmark initiative to develop new technological and computational capabilities for geosciences research, will be supported by the University Corporation for Atmospheric Research (UCAR) under a new agreement with the National Science Foundation (NSF). Created by NSF in 2011, EarthCube aims to help researchers across the geosciences from meteorology to seismology better understand our planet in ways that can strengthen societal resilience to natural events. More than 2,500 EarthCube contributors – including scientists, educators, and information professionals – work together on the creation of a common cyberinfrastructure for researchers to collect, access, analyze, share, and visualize all forms of data and related resources. "EarthCube offers the promise to advance geoscience research by creating and delivering critical new capabilities,” said UCAR scientist Mohan Ramamurthy, principal investigator and project director of the new EarthCube office at UCAR. "This is a great opportunity for UCAR to leverage its successful track record in managing large scientific projects that advance our understanding of the planet," said Michael Thompson, interim UCAR president. "The EarthCube project offers the potential to significantly benefit society by helping scientists use the power of diverse big datasets to better understand and predict the natural events, from severe storms to solar disturbances, that affect all of us." EarthCube is designed to foster collaborations across the geosciences. The technology helps scientists in different disciplines better understand the far-reaching influences of natural events, such as how major storms like Sandy (above) affect coastal and inland flooding. This unique view of Sandy was generated with NCAR's VAPOR visualization software, based on detailed computer modeling. (©UCAR. Visualization by Alan Norton, NCAR, based on research by NCAR scientists Mel Shapiro and Thomas Galarneau. This image is freely available for media & nonprofit use. Click here for higher resolution.) UCAR will administer the day-to-day operations of EarthCube under the three-year, $2.8 million agreement with NSF. The EarthCube science support office, currently funded through an NSF grant to the Arizona Geological Survey in Tucson, Arizona, will move to UCAR's Boulder offices starting this month. EarthCube is designed to help researchers across the geosciences address the challenges of understanding and predicting the complexity of the Earth system, from the geology and topography to the water cycle, atmosphere, and space environment of the planet. This approach is critical for improved understanding of the environment and better safeguarding society. In order to better predict the potential effects of a landfalling hurricane on inland mudslides, for example, scientists from multiple disciplines, including meteorology, hydrology, geography, and geology, need a common platform to work together to collect observations, ingest them into advanced computer models of the Earth system, and analyze and interpret the resulting data. "The EarthCube Science Support Office will help us find and share the data geoscientists collect and use to answer critical science questions about the Earth," said Eva Zanzerkia, program director in NSF’s Division of Earth Sciences. Ramamurthy said UCAR is well positioned to help EarthCube meet its goals, since UCAR provides technological support to the geosciences community, including its 109 member universities. UCAR has been involved with EarthCube since NSF launched the initiative. "Currently researchers are spending an enormous amount of time on routine tasks because there is no data system, database, or data infrastructure where they can get all the information they need in some kind of a uniform way from a single interface," Ramamurthy said. "If EarthCube can facilitate the integration of data from multiple domains in a way that is easier and faster, and if there is interoperability in terms of standards for data to be input into a common environment, then integration becomes more easily possible." UCAR is a nonprofit consortium of more than 100 member colleges and universities focused on research and training in the atmospheric and related Earth system sciences. UCAR’s primary activity is managing the National Center for Atmospheric Research (NCAR) on behalf of NSF, NCAR’s sponsor. UCAR also oversees a variety of education and scientific support activities under the umbrella of the UCAR Community Programs, which will administer EarthCube.

Wrangling observations into models

April 4, 2016 | If scientists could directly measure the properties of all the water throughout the world’s oceans, they wouldn’t need help from NCAR scientist Alicia Karspeck. But since large expanses of the oceans are beyond the reach of observing instruments, Karspeck’s work is critical for those who want estimates of temperature, salinity, and other properties of water around the globe. Scientists need these estimates to better understand the world’s climate system and how it is changing. “It’s painstaking work, but my hope is it will lead to major advances in climate modeling and long-term prediction,” Karspeck said. She is one of a dozen or so researchers at NCAR who spend their days on data assimilation, a field that is becoming increasingly important for the geosciences and other areas of research. Broadly speaking, data assimilation is any method of enabling computer models to utilize relevant observations. Part science and part art, it involves figuring out how to get available measurements--which may be sparse, tightly clustered, or irregularly scattered--into models that tend to simplify the world by breaking it into gridded boxes. Commonly used in weather forecasting, the technique can improve simulations and help scientists predict future events with more confidence. It can also identify deficiencies in both models and observations. As models have become more powerful and observations more numerous, the technique has become so critical that NCAR last year launched a Data Assimilation Program to better leverage expertise across its seven labs. “Activities in data assimilation have grown well beyond traditional applications in numerical weather prediction for the atmosphere and now span across NCAR’s laboratories,” said NCAR Director Jim Hurrell. “The Data Assimilation program is designed to enhance data assimilation research at NCAR, while at the same time serving the broader U.S. research community.” Scientists are using data assimilation techniques to input a range of North American observations into experimental, high-resolution U.S. forecasts. These real-time ensemble forecasts are publicly available while they're being tested. (@UCAR. This image is freely available for media & nonprofit use.) Improving prediction Created by the NCAR Directorate, the Data Assimilation Program is designed to advance prediction of events ranging from severe weather and floods to air pollution outbreaks and peaks in the solar cycle. One of its goals is to encourage collaborations among data assimilation experts at NCAR and the larger research community. For example, scientists in several labs are joining forces to apply data assimilation methods to satellite measurements to create a database of global winds and other atmospheric properties. This database will then be used for a broad range of climate and weather studies. The program also provides funding to hire postdocs at NCAR to focus on data assimilation projects as well as for a software engineer to support such activities. NCAR Senior Scientist Chris Snyder coordinates the Data Assimilation Program. "By bringing money to the table, we’re building up data assimilation capability across NCAR,” said NCAR Senior Scientist Chris Snyder, who coordinates the Data Assimilation Program. “This is critical because data assimilation provides a framework to scientists throughout the atmospheric and related sciences who need to assess where the uncertainties are and how a given observation can help.” NCAR Senior Scientist Jeff Anderson, who oversees the Data Assimilation Research Testbed (DART), says that data assimilation has become central for the geosciences. DART is a software environment that helps researchers develop data assimilation methods and observations with various computer models. “I think the Data Assimilation Program is a huge win for NCAR and the entire atmospheric sciences community,” Anderson said. “The scientific method is about taking observations of the world and making sense of them, and data assimilation is fundamental for applying the scientific method to the geosciences as well as to other research areas.” From oceans to Sun Here are examples of how data assimilation is advancing our understanding of atmospheric and related processes from ocean depths to the Sun’s interior: Oceans. Karspeck is using data assimilation to estimate water properties and currents throughout the world's oceans. This is a computationally demanding task that requires feeding observations into the NCAR-based Community Earth System Model, simulating several days of ocean conditions on the Yellowstone supercomputer, and using those results to update the conditions in the model and run another simulation. The good news: the resulting simulations match well with historical records, indicating that the data assimilation approach is working. “My goal is to turn this into a viable system for researchers,” Karspeck said. Air quality. Atmospheric chemists at NCAR are using data assimilation of satellite observations to improve air quality models that currently draw on limited surface observations of pollutants. For example, assimilating satellite observations would show the effect of emissions from a wildfire in Montana on downwind air quality, such as in Chicago.  “We've done a lot of work to speed up the processing time and the results are promising," said NCAR scientist Helen Worden. “The model simulations after assimilating satellite carbon monoxide data are much closer to actual air quality conditions.” Weather forecasting. Data assimilation is helping scientists diagnose problems with weather models. For example, why do models consistently overpredict or underpredict temperatures near the surface? Using data assimilation, NCAR scientist Josh Hacker discovered that models incorrectly simulate the transfer of heat from the ground into the atmosphere. “With data assimilation, you’re repeatedly confronting the model with observations so you can very quickly see how things go wrong,” he said. Solar cycle. Scientists believe the 11-year solar cycle is driven by mysterious processes deep below the Sun’s surface, such as the movements of cells of plasma between the Sun’s lower latitudes and poles. To understand the causes of the cycle and ultimately predict it, they are turning to data assimilation to augment observations of magnetic fields and plasma flow at the Sun’s surface and feed the resulting information into a computer model of subsurface processes. “We are matching surface conditions to the model, such as the pattern and speed of the plasma flows and evolving magnetic fields,” said NCAR scientist Mausumi Dikpati. Capturing data. In addition to helping scientists improve models, the new Data Assimilation Program is also fostering discussions about observations. NCAR senior scientist Wen-Chau Lee and colleagues who are experts in gathering observations are conferring with computer modelers over how to process the data for the models to readily ingest. One challenge, for example, is that radars may take observations every 150 meters whereas the models often have a resolution of 1-3 kilometers. Inputting the radar observations into the models requires advanced quality control techniques, including coordinate transformation (modifying coordinates from observations to the models) and data thinning (reducing the density of observations while retaining the basic information). “We are modifying our quality control procedures to make sure that the flow of data is smooth.” Lee said. “With data assimilation, the first word is ‘data’,” he added. “Without data, without observations, there is no assimilation.” Writer/contactDavid Hosansky, Manager of Media Relations FundersNCAR Directorate National Science FoundationAdditional funding agencies for specific projects  

Earth Science Week 2015: NCAR visualizes Earth, air, fire & water

October 12, 2015 | We're excited it's Earth Science Week, and even more excited about this year's theme—visualizing Earth systems—because it happens to be one of the things NCAR does best. NCAR visualizations cover the spectrum, from Earth to air to fire to water.   Clockwise from top left: EARTH (ground movement for an earthquake in California), AIR (wind trajectories during a marine cyclone), FIRE (behavior of a Colorado wildfire), and  WATER (sea surface temperature anomalies during El Niño and La Niña). Click on the images to watch the full video versions of the simulations. Scientists across NCAR and at collaborating universities create visualizations to help make sense of their research, often with the help of the Computational and Information Systems Lab. CISL houses the VisLab (the Scientific Visualization Services Group), VAPOR (the Visualization and Analysis Platform for Ocean, Atmosphere and Solar Researchers group); and NCL (the NCAR Command Language group). These teams of software engineers and other professionals are resources for scientists who want to make their research come alive. Learn more about how the visualizations are made here. Earth Science Week was launched by the American Geosciences Institute in 1998. #EarthSciWeek 2015 runs from Oct. 11 through Oct 18.   Writer/contactLaura Snider, Senior Science Writer and Public Information Officer

Watch 2015 and 1997 El Niños build, side by side

September 3, 2015 | The El Niño brewing in the tropical Pacific is on track to become one of the strongest such events in recorded history and may even warm its way past the historic 1997-98 El Niño. While it's too early to say if the current El Niño will live up to the hype, this new NCAR visualization comparing sea surface temperatures in the tropical Pacific in 1997 to those in 2015 gives a revealing glimpse into the similarities, and differences, between the two events. Sea surface temperatures are key to gauging the strength of an El Niño, which is marked by warmer-than-average waters. Even if this year's El Niño goes on to take the title for strongest recorded event, there's no guarantee that the impacts on weather around the world will be the same as they were in 1997-98. Like snowflakes, each El Niño is unique. Still, experts are pondering whether a strong El Niño might ease California's unrelenting drought, cause heatwaves in Australia, cut coffee production in Uganda, and impact the food supply for Peruvian vicuñas. This video animation was created by Matt Rehme at NCAR's Visualization Lab, part of the Computational & Information Systems Lab. It uses the latest data from the National Oceanic and Atmospheric Administration. Rehme had previously created a similar visualization of the 1997-98 El Niño. When comparisons between this year's El Niño and that event began flying around, he decided to make a second animation and compare the two. "I was a little shocked just how closely 2015 resembles 1997 visually," Rehme said. More on El Niño El Niño, La Niña & ENSO FAQ Here comes El Niño—but what exactly is it? El Niño or La Nada? The great forecast challenge of 2014 ¡Hola, La Nada! What happens when El Niño and La Niña take a break? Writer/contactLaura Snider

NCAR "STEPs" up rain, flood research

August 26, 2015 | While many people take advantage of the sunshine this time of year, NCAR scientist Rita Roberts seeks out storms. Roberts is leading an experiment this summer along the Front Range to improve short-term forecasts of heavy rainfall and flash floods, particularly over complex terrain. The tests, which also took place last summer, are part of NCAR’s Short Term Explicit Prediction (STEP) program. The project is pioneering in that it runs several meteorological and hydrological models at the same time, combined with advanced data analysis. Funding comes from the National Science Foundation. Radar images of precipitation (top), with computer model outputs (below) of rainfall accumulation and short-term rainfall predictions for specific areas. STEP uses meteorological and hydrological models, combined with advanced data analysis, to improve short-term forecasts of rain and flash floods. (©UCAR. This image is freely available for media & nonprofit use.) "The system captures in real time where storms are forming and where they are dissipating," said Roberts, who has analyzed high-impact weather events for NCAR since 1982. "It's about improving predictions of heavy rainfall and flash flooding." STEP’s rainfall forecast will be tested during the monsoon season next spring in Taiwan, which has a similar mix of plains and rugged mountains as the Front Range. Individual components of STEP are being tested elsewhere, while the complete system is drawing interest from weather forecast offices in other countries. The forecast models currently used by weather forecasters don’t always provide accurate rainfall rates and the models have difficulty pinpointing the exact location where the heavy rainfall will occur. Studies show atmospheric conditions can change rapidly, resulting in large shifts of weather. STEP’s goal is to provide accurate rainfall and streamflow forecasts up to a day out with particular emphasis on nowcasting exactly where the heavy rainfall will be in the next few hours using information that is updated continuously. Such short-term forecasts are critical to providing warnings to communities so they can reduce fatalities, injuries, and economic damage from rainstorms, floods, and other extreme weather events. Roberts said she believes STEP also could be used to provide motorists with real-time alerts about areas to avoid because of rain and potential flooding. Said Jenny Sun, chair of the STEP program: "We’ve talked with weather forecasters who tell us their biggest challenge is to forecast heavy rainfall—and the biggest impact to the community is flooding." The STEP test along the Front Range combines: Data from 17 radar stations and other observational equipment High-resolution rainfall forecasts from the NCAR-based Weather Research and Forecasting (WRF) system Auto-nowcaster, a software program that projects the evolution of storms and rainfall over the next 10 minutes to 1 hour WRF-Hydro, which generates stream flow predictions in a 0- to 12-hour timeframe Citizen participation a key STEP produces digital maps with a resolution of one square kilometer that show how much rain has fallen in the past two hours and how much is expected in the next hour. More complex maps show how atmospheric conditions and stream levels are changing. Rainfall forecasts are evaluated in part through an extensive network of rain gauges run by the Community Collaborative Rain Hail and Snow Network (CoCoRaHS), a nonprofit network of citizen volunteers. Leaders of various aspects of STEP include NCAR scientists Barbara Brown, Dave Gochis, and Jim Wilson. NCAR scientists Jenny Sun and Rita Roberts examine radar images of heavy rainfall along the Front Range. (©UCAR. Photo by Carlye Calvin. This image is freely available for media & nonprofit use.) The auto-nowcaster component has been tested in Texas; Florida; and Washington, D.C. WRF-Hydro, a hydrological modeling extension package that can operate independently or coupled with the WRF atmospheric model, is being integrated into the National Weather Service’s new National Water Center and is expected to start running in real time next May.  Sun said NCAR scientists also are collaborating with a Japanese electric power research institute, the Beijing Meteorological Bureau, and Panasonic’s weather solutions unit, which has offices in Colorado and North Carolina. The project wouldn’t be possible without advances in the ability to observe how three-dimensional phenomena in the atmosphere evolve over time. NCAR’s powerful Yellowstone supercomputer in Wyoming crunches the data. Since scientists can’t measure all atmospheric conditions at any given moment, the STEP program takes uncertainties into account by using NCAR’s ensemble modeling approach led by Morris Weisman and Glen Romine. This way a range of equally likely conditions can be simulated. Roberts said the STEP team is planning to conduct another real-time test along the Front Range next summer. "Our goal," she said, "is to keep improving the capability and accuracy of the system." Writer/Contact Jeff Smith Funder National Science Foundation

Boulder team wins international water prize

BOULDER — Groundbreaking work by a group of Boulder scientists has been recognized this month with one of the world’s most prestigious awards for innovations related to water resources. The research team, from the University Corporation for Atmospheric Research (UCAR), the University of Colorado Boulder (CU Boulder), and the National Oceanic and Atmospheric Administration (NOAA), has worked for the past five years to develop a way to use GPS technology to measure soil moisture, snow depth, and vegetation water content. The work has won a 2014 Creativity Prize from the Prince Sultan Bin Abdulaziz International Prize for Water. “It’s an honor to be recognized by the broader international science community,” said UCAR scientist John Braun, a GPS expert and member of the research team. “This work can significantly improve how we measure changes in a number of key components of the water cycle.” Braun and his colleagues—Kristine Larson and Eric Small at CU Boulder and Valery Zavorotny at NOAA’s Earth System Research Laboratory—won the prize for developing a new observational technique that takes advantage of data from high-precision GPS stations. Although GPS instruments at these stations were installed for other purposes (by geoscientists to measure plate tectonic motions and by surveyors to measure land boundaries), the Boulder research group was able to isolate GPS signals that reflected near the instruments’ antennas to produce daily measurements of soil moisture, vegetation water content, and snow depth. The group named the technique GPS Interferometric Reflectometry (GPS-IR). Because there are currently over 10,000 such GPS stations operating around the world, the extension of this method to even a subset of these sites would significantly enhance the ability to measure the water cycle. Recipients of this year's Creativity Prize from the Prince Sultan Bin Abdulaziz International Prize for Water include (left to right) Valery Zavorotny (NOAA), Kristine Larson and Eric Small (University of Colorado Boulder), and John Braun (UCAR). (©UCAR. Photo by Bob Henson. This image is freely available for media & nonprofit use.) Currently, the team uses the GPS-IR technique to analyze data streams from existing GPS networks within the western United States. Scientists and government agencies can use their data products, available at the research team’s web portal, to improve monitoring and forecasting of hydrologic variables. “The GPS-based estimates represent a larger sampling area than traditional point measurements gathered in the field,” said Small, a professor in CU Boulder’s Department of Geological Sciences. “This provides information that is particularly useful for applications such as tracking the amount of water stored in mountain snow pack.” The research has been funded by the National Science Foundation and NASA. Turning errors into data GPS-IR is based on reflected signals, which are a source of errors that have plagued the primary users of GPS technology since its inception. Some of the initial research involving GPS and snow-depth measurement took place at the Niwot Ridge field site, located in the foothills above Boulder, starting in 2009. (Image courtesy Ethan Gutmann, NCAR.) “I spent almost five years of my career trying to make reflected signals go away so that I could produce better estimates of tectonic and volcanic deformation,” said Larson, a professor in CU Boulder’s Department of Aerospace Engineering Sciences and leader of the research team. “One of the great things about Boulder is that once we had the idea to turn this error source into something useful, we were able to put together a great interdisciplinary research team from CU, NOAA, and UCAR to work on it.” Larson will accept the award at a ceremony in Riyadh, Saudi Arabia, on December 1. The Prince Sultan Bin Abdulaziz International Prize for Water aims to give recognition to the efforts that scientists, inventors, and research organizations around the world are making in water-related fields. The prizes acknowledge exceptional and innovative work that contributes to the sustainable availability of potable water and the alleviation of the escalating global problem of water scarcity. The 2014 Creativity Prize, worth $266,000, was split between the Boulder-based GPS-IR group and scientists at Princeton University studying drought. “We’re grateful that the ingenuity of these scientists is being recognized,” said UCAR president Thomas Bogdan. “This project is a great example of a creative team turning information that would otherwise be discarded into useful data that can benefit society.” Several Colorado researchers have been recognized with the International Prize for Water since its inception in 2004. Previous winners include: Kevin Trenberth and Aiguo Dai, Surface Water Prize, 2012 (National Center for Atmospheric Research, Boulder, Colo.) Chih Ted Yang, Surface Water Prize, 2008 (Colorado State University, Ft. Collins, Colo.)    

Making research data more traceable

September 3, 2014 | As observing instruments and computer modeling become increasingly refined, the amount of data generated by field studies has grown tremendously. Storing and archiving the data is a challenge in itself, but scientists also need the data to be easily accessible and connected to other relevant resources. Researchers use terrestrial laser scanning technology to analyze a dinosaur track site in Denali National Park and Preserve. Geophysical data from ground-based imaging coordinated by UNAVCO is being incorporated in a two-year EarthCube initiative. (Image courtesy UNAVCO and the Perot Museum of Nature and Science.) To help address this issue, UCAR is launching a project with two partners—Cornell University and UNAVCO—that aims to connect the dots among field experiments, research teams, datasets, research instruments, and published findings. The two-year project, titled "Enabling Scientific Collaboration and Discovery through Semantic Connections," is funded by the National Science Foundation’s EarthCube initiative, which supports transformative approaches to data management across the geosciences. The project will demonstrate the benefits of a linked open data tool, known as VIVO, for managing scientific information and data. Developed by Cornell University Library in collaboration with a number of partners, VIVO is being used by over 100 organizations to create authoritative research profiles for faculty and staff as well as to link to their published studies and other relevant research. Other organizations, such as the Laboratory for Atmospheric and Space Physics at the University of Colorado Boulder, are extending VIVO to manage information related to scientific projects and research instruments. Cold air pouring over the Bering Sea from the south coast of Alaska on April 7, 2013, formed these cloud streets, associated with parallel cylinders of spinning air. The Bering Sea Project is studying the potential effects of climate change on marine ecosystems across the eastern part of the sea. Observations from the project are informing a two-year study of data management. (Image courtesy NASA Earth Observatory.) The project aims to adapt VIVO so it can be applied to large-scale field experiments involving many investigators from a wide range of institutions. This would create a network of information linking field experiments with particular datasets, authors, publications, and even research tools that result from or are associated with each experiment. "Someone coming from the outside would be able to find a particular paper that emerged from a field experiment and very quickly track down datasets, instruments, researchers, and so on," said Matthew Mayernik, an expert on research data services in the NCAR/UCAR Library who is the principal investigator on the project. "This is really about increasing the traceability of research and making it easier for people to find, assess, and use data." To demonstrate the effectiveness of the approach, Mayernik and his colleagues will use VIVO for data from two sources: a recent NSF-supported interdisciplinary field program whose data archive is hosted by NCAR’s Earth Observing Laboratory (the Bering Sea Project), and a set of diverse research projects informed by geodetic tools, such as GPS networks and ground-based imaging, that are operated and maintained by UNAVCO. If successful, Mayernik said such an approach would be expanded to other field experiments, including their data sets, researchers, publications, and research resources.  WriterDavid Hosansky, NCAR & UCAR Communications Collaborating institutionsCornell UniversityNational Center for Atmospheric Research/   University Corporation for Atmospheric ResearchUNAVCO FunderNational Science Foundation (EarthCube initiative)  

Geoscience data services to expand with NSF sponsorship

BOULDER—A program that provides unique data support to geoscientists worldwide will expand its services over the next five years, under a renewal of its grant with the National Science Foundation (NSF). Unidata, managed by the University Corporation for Atmospheric Research (UCAR), provides atmospheric science data to university departments in near real time. Its services encompass a wide range of cyberinfrastructure technologies that make geoscience data more useful and accessible for scientists and educators at more than 3,000 educational, government, and research institutions worldwide, including 700 U.S. universities. This 3-D depiction of the flow in and around 2008's Hurricane Gustav was created using Unidata's Integrated Data Viewer. Click on image to animate. (Visualization courtesy Unidata.) Under the new award with NSF of up to $25 million, Unidata will tap emerging technologies to better serve the geoscience community. This includes using cloud computing in ways that will enable researchers worldwide to access data and collaborate more effectively with colleagues at distant organizations and across scientific disciplines in order to tackle major scientific challenges. “We’re working to leverage the advantages of and advances in cloud-based computing paradigms that have emerged and become robust in recent years,” said Unidata director Mohan Ramamurthy. “The goal is to help advance scientific understanding of the physical world by better enabling scientists to extract knowledge from a deluge of observations and other data.” By gathering information into a cloud environment, the Unidata approach will also reduce the amount of data that must be transferred over computer networks and ease the computing requirements at universities and other research organizations. Unidata focuses on enabling scientists to better access, analyze, and integrate large amounts of data. It has also developed sophisticated tools to visualize information. Although Unidata’s core activities focus on serving scientists and educators in the atmospheric and related sciences, virtually every project that Unidata undertakes has a broader impact on the geosciences community and society at large. Unidata-developed cyberinfrastructure is in wide use among U.S. federal agencies, private industry, and non-governmental and international organizations, including the National Oceanic and Atmospheric Administration, the Department of Energy, Department of Defense, and NASA. More than 100,000 university students across the country are expected to use Unidata’s products and services, and hundreds of scholarly articles reference Unidata annually. Professors and other Unidata users said its services are critical for geoscience education and research. “Unidata provides the superhighway needed to connect my students to critical weather observations used for education and teaching in the atmospheric and related sciences,” said Jim Steenburgh, professor of atmospheric sciences at the University of Utah. At Millersville University, scientists and education experts in the Earth Sciences and Computer Science departments used a Unidata analysis and visualization tool to create a 3-D virtual immersion experience known as GEOpod. This allows the user to navigate a virtual probe within a computer simulation of the atmosphere, capturing temperature, humidity, and other parameters while using navigational aids and tracking capabilities. "With the help of Unidata, we can essentially bring students into a numerical weather model, helping them better understand the actual atmosphere as well as the modeling process,” said Richard Clark, chairman of the Earth Sciences Department at Millersville University. Unidata is a community data and software facility for the atmospheric and related sciences, established in 1984 by U.S. universities with sponsorship from NSF.

Extending the life of field projects

November 5, 2013 | After the excitement and exhaustion of a months-long field project, the last thing any scientist or funder wants is for the resulting data to be lost or locked away forever. NCAR has a handy antidote for that concern. Not only does the center’s Earth Observing Laboratory (EOL) provide observing platforms, experienced technicians, logistics, and data capture, but it also assures the field work will pay off for years to come by maintaining one of the world’s largest archives of data from international field campaigns involving atmospheric and multidisciplinary science. The archives include plenty of online tools for access and research, maintained by a deeply experienced staff. The vast data holdings of NCAR’s Earth Observing Laboratory are managed by a team that includes Bob Rilling and Steve Williams. (©UCAR. Photo by Carlye Calvin. This image is freely available for media & nonprofit use.) The EOL data trove currently includes more than 400 field projects and nearly 6,000 data sets, which comprise some 17 million files and more than 100 terabytes of data, all available at no charge to the research community. The oldest project is the Line Island Experiment (1967), which studied equatorial circulations in the central Pacific Ocean. Among the latest additions are data from two major projects this past spring, the Mesoscale Predictability Experiment (MPEX—see archive) and the Southeast Atmosphere Study (SAS—see archive). “When most people have moved on to the next field project, we’re just getting our hands dirty,” says Steve Williams, who heads EOL’s Data Management Group within EOL’s Computing, Data, and Software Facility. This group of 11, which includes scientists, software engineers, and students, is a full-service unit, shepherding data from the point of collection to long-term stewardship. The group’s archives can be accessed through the EOL Data Archives, which lists all projects since 1967, or via the NCAR-based Community Data Portal. Some archives are produced and maintained in collaboration with other entities, such as NCAR's Research Data Archive at NCAR's Computational & Information Systems Laboratory. Mining past success NCAR's S-Pol radar is silhouetted against brightly lit tropical cumulus clouds. This photo was taken during S-Pol's deployment in the Maldives for the DYNAMO (Dynamics of the Madden-Julian Oscillation) field campaign of 2011–12. The project was designed to help improve long-range weather forecasts and seasonal outlooks and to help scientists further refine computer models of global climate. DYNAMO is among more than 400 field projects with data archived at NCAR. (©UCAR. Photo by Michael Dixon, NCAR. This image is freely available for media & nonprofit use.) Even if a field project took place decades ago, its data can be invaluable to a current area of study. For example, the EOL archives include extensive collections on both the original Verification of the Origins of Rotation in Tornadoes Experiment (VORTEX), conducted in 1994–95, and its follow-up, VORTEX2, carried out in 2009–10. NCAR has noticed an increase in such data requests for legacy projects such as these, which Williams believes may be related to growing interest in climate change research. Another example involves the Dynamics of the Madden-Julian Oscillation campaign (DYNAMO), which studied processes in and near the Indian Ocean in late 2011 and early 2012. At a workshop this spring, DYNAMO investigators explored how the project built on findings from the mammoth international TOGA COARE study (1992–93) and the even larger GATE project in 1974. The EOL archives and other data sources are helping to keep the earlier projects valuable, according to scientists. The NCAR GATE Group, represented in this file photo by Edward Zipser, William Lanterman, and Henry van de Boogaard, spent years planning the massive 1974 field experiment. Data analysis took more years and more people, including Margaret LeMone, Rebecca Meitin, Al Miller, William Pennell, Katsuyuki “Vic” Ooyama, and Herbert Riehl. (©UCAR. This image is freely available for media & nonprofit use.) “None of these projects are dead—they’re still a live concern,” says Robert Houze (University of Washington). Houze was a PI in DYNAMO and TOGA COARE as well as the 72-nation GATE project, which he terms “a tremendous, unprecedented effort.” EOL’s page on GATE brings together several resources and distributed archives, including a CISL website with a few key datasets and a GATE page from the NCAR Archives that includes project newsletters, photographs, and correspondence. Unfortunately, much of GATE's digital data was not preserved, underscoring the importance of EOL’s efforts to archive datasets from current field campaigns. Large, complex field projects can produce eye-popping volumes of data. The complex DYNAMO campaign resulted in at least 400 disparate datasets, including many from several ships and aircraft, with five terabytes of data—almost 5% of EOL’s entire digital archive. Another recent project includes measurements taken during a two-year series of five one month-long missions spread through the annual cycle. HIPPO—the HIAPER Pole-to-Pole Observations study of greenhouse gases and aerosols—included the first measurements at fine vertical resolution of over 90 atmospheric species collected at latitudes extending nearly from pole to pole over the Pacific Ocean. The HIPPO data are spread among two portals, one maintained by the U.S. Department of Energy’s Carbon Dioxide Information Analysis Center (CDIAC) and the other by NCAR/EOL. “For many years our team had dreamed of a dataset of this kind,” says HIPPO PI Steven Wofsy (Harvard University). “We are delighted that the measurements have now been made and are available to the whole scientific community.” What to do with a “bag of bytes” One thing that makes data management easier today than in years past is that most new pieces of data from even the most sprawling project are already in digital form when they reach the EOL team. Before the 1990s, investigators stored their data using an eclectic variety of formats and media, from fax printouts and hard-copy satellite photos to nine-track magnetic tapes. For historical campaigns, tracking down these pieces and putting them together is a challenge in itself. As a product of the pre-Internet age, the 1974 GARP Atlantic Tropical Experiment (GATE) lives on largely in the form of printouts, binders, photos, and magnetic tape. For more about efforts to preserve data from atmospheric soundings collected during field projects, see the article “Legacy Atmospheric Sounding Data Project” (PDF), which appeared in the January 2012 issue of the Bulletin of the American Meteorological Society. (Photo by Paul Ciesielski, Colorado State University.) “We found one tape in a garage in Edmonton, Alberta,” says Williams. It held data from the Severe Environmental Storm and Mesoscale Experiment (SESAME) project, a landmark study of severe storms across the Great Plains in 1979. “The tape had been stored there for many years in less-than-optimal conditions, with large temperature variations. Amazingly, we were able to get data from it.” Once the legacy media are in hand, it’s not always a simple matter to determine what format the datasets are in and figure out how to extract them. “We may end up with files and not have the software to read them,” says Williams. That’s where accessible metadata—information about how the data are structured and stored—can make the difference between life and death for a particular thread of observations within a field project. To revive archaic formats when needed, the EOL group draws on an array of older tape drives, some of them decades old. The group also works often with NCAR's Computational & Information Systems Laboratory, whose decades of experience in archiving data generated by supercomputer experiments includes support in retrieving data from older storage media and formats. Planning for archiving before there’s any data It’s never too soon before the start of a field project for a researcher to approach NCAR and start pondering how their hard-won data will be preserved. Most field projects now allow users and others to track progress and upload some forms of data in real time. “We can help scientists think about how to structure their data policy and their website,” says Williams. Often his group’s work starts about a year before a field campaign kicks off. In the case of projects funded by the National Science Foundation, for example, the group can help researchers coordinate data meetings, set up mailing lists, and meet the NSF’s requirement for developing a data management strategy and plan. Digital data have been stored in a wide variety of formats over the years. (©UCAR. Photo by Carlye Calvin. This image is freely available for media & nonprofit use.) Among the keys to success in data archiving: making sure that all of the investigators in a project get their collected data promptly to a lead investigator or another designee from the project. “Strong leadership makes a difference,” says Williams. Although data stewardship might seem like a thankless job in the moment, no one knows how such data will be used in the future. Scientific discovery can arise from data that were originally collected for an entirely different purpose. Williams says it’s imperative to diligently archive the data, metadata, detailed documentation, and any related software that will be required while it’s all fresh in a data provider’s mind. “In 20 years or so, when the PI or other project staff may no longer be around, we need the capability for the next generation of scientists to access these datasets and have enough information to intelligently understand and make use of them. That’s what data stewardship is all about."

Pages

Subscribe to Data & Data Analysis