Statistics and Analysis with High-Performance Computation
Climate simulation output and Earth-system observation datasets continue to increase in size. While computational performance continues to increase, it is driven by the change to multi-core parallel systems from the desktop to high-performance clusters. This talk will start with some background describing how data analysis for scientific datasets faces this change in paradigm in order to maintain its capability relative to the growing analysis tasks. Examples of statistical analysis using high-performance computing (HPC), NCAR’s Yellowstone, illustrate some paths for parallel analysis, including: how the statistical computing language R can be adapted to an explicitly parallel environment, and how this can be used to quickly process large datasets with decomposable tasks. Examples include interpolating spatial fields to create gridded data products and searching space/time fields for temperature events. More than just keeping pace, using HPC allows new ways to think about what is possible in exploratory data analysis and data analysis products. Some fundamental challenges facing existing statistical methods in spatial data analysis will also be discussed, making explicit the assumptions of spatial statistical models and ways in which their calculation scales and doesn’t easily scale.