Speaker: Dr. Jessica Lin
George Mason University
Date: July 29, 2013
Place: FL 2 – Room 1022
Massive amounts of data are generated daily at a rapid rate. As a result, the world is faced with unprecedented challenges and opportunities on managing the ever-growing data, and much of the world's supply of data is in the form of time series. One obvious problem of handling time series databases concerns its typically massive size---gigabytes or even terabytes are common, with more and more databases reaching the petabyte scale. Most classic data mining algorithms do not perform or scale well on time series data due to their unique structure. In particular, the high dimensionality, very high feature correlation, and the typically large amount of noise that characterize time series data present a difficult challenge. As a result, time series data mining has attracted an enormous amount of attention in the past two decades. This presentation gives an overview of my contributions in the field of time series data mining. The first part of the presentation discusses time series data mining fundamentals - more specifically, the two aspects that hugely determine the efficiency and effectiveness of most time series data mining algorithms: data representation and similarity measure. More specifically, I will discuss Symbolic Aggregate approximation (SAX), a symbolic representation that has become the gold standard of time series representation, and a building block for many time series data mining tasks in the past decade. The second part of the presentation will focus on the discovery of novel and non-trivial patterns in time series data, including frequently encountered (or repeated) patterns, rare (or anomalous) patterns, or latent structure.
Dr. Jessica Lin is an Associate Professor in the Department of Computer Science at George Mason University. She received her PhD degree from University of California, Riverside in June, 2005. Her research interests encompass broad areas of data mining, especially data mining for large temporal and spatiotemporal databases, text, and images. This research work is partially funded by NSF and the U.S. Army.