The 21st IASTED International Conference on
Modelling and Simulation
MS 2010

July 15 – 17, 2010
Banff, Alberta, Canada

KEYNOTE SPEAKER

Data Mining and Environmental Sciences

Prof. Shashi Shekhar
University of Minnesota, USA

Abstract

fiogf49gjkf0d
It is critical to monitor and predict where and when large contaminant fluxes will occur so that actions may be taken to protect environment and limit exposure to human and aquatic life. Current water quality monitoring is based on infrequent (e.g., weekly) sampling and time-consuming (e.g., hours to days) testing methods, making it difficult to make timely decisions to protect watersheds, a crucial part of our environment. Recent advances have led to use of monitoring networks based on sensors to provide increased sampling frequency as well as digital watershed data warehouses to manage the sensor data. However, key challenges remain. Researchers need new models that they can apply to take full advantage of these new types of data sets, as current models do not adequately account for the huge quantities of data collected and the new patterns that are observable as a result. The goal of this project is to advance new scalable spatio-temporal data mining tools and protocols for monitoring, detecting, and predicting contamination of environment. Classical and spatial data mining ideas are generalized to represent and analyze data sets related to physical processes such as water flow and contaminant flow using novel methods such as flow anomaly detection and teleconnection detection.
Given a percentage-threshold and readings from a pair of consecutive upstream and downstream sensors, flow anomaly discovery identifies dominant time intervals where the fraction of time instants of significantly mis-matched sensor readings exceed the given percentage-threshold. Discovering flow anomalies (FA) is an important problem in environmental flow monitoring networks and early warning detection systems for water quality problems. However, mining Fas is computationally expensive because of the large (potentially infinite) number of time instants of measurement and potentially long delays due to stagnant (e.g. lakes) or slow moving (e.g. wetland) water bodies between consecutive sensors. Traditional outlier detection methods (e.g. t-test) are suited for detecting transient Fas (i.e., time instants of significant mis-matches across consecutive sensors) and cannot detect persistent Fas (i.e., long variable time-windows with a high fraction of time instant transient Fas) due to a lack of a pre-defined window size. In contrast, we propose a Smart Window Enumeration and Evaluation of persistence-Thresholds (SWEET) method to efficiently explore the search space of all possible window lengths. Computation overhead is brought down significantly by restricting the start and end points of a window to coincide with transient Fas, using a smart counter and efficient pruning techniques. Experimental evaluation using a real dataset shows our proposed approach outperforms Naïve alternatives.

Biography of the Keynote Speaker

Keynote Speaker Portrait

fiogf49gjkf0d
Shashi Shekhar is a McKnight Distinguished University Professor at the University of Minnesota, Minneapolis, MN, USA. For contributions to spatial databases, spatial data mining, and geographic information systems(GIS), he received the IEEE Technical Achievement Award and was elected a Fellow of the IEEE as well as the American Assoc. for Advancement of Science. He co-edited an Encyclopedia of GIS (Springer, 2008, ISBN 978-0-387-30858-6), and co-authored a textbook on Spatial Databases (Prentice Hall, 2003, ISBN 0-13-017480-7) which has been translated into multiple foreign languages. He has co-authored over 200 research papers in peer-reviewed journals, books, and conferences, and workshops. He is serving as a co-Editor-in-Chief of Geo-Informatica: An Intl. Journal on Advances in Computer Sc. For GIS (ISSN 1384-6175) and a member of the steering committee of the ACM Intl. Conference on GIS as well as IEEE Workshop on Spatial and Spatio-temporal Data Mining. He served on two committees of the National Research Council National Academy of Sciences, namely, the committee on mapping sciences (2004-2009) and the committee to review the basic and applied research at National Geospatial-Intelligence Agency (2005). He also served as a member of the Board of Directors of University Consortium on GIS (2003-2004), a member of the editorial boards of IEEE Transactions on Knowledge and Data Eng., a member of the IEEE-CS Computer Science & Engineering Practice Board, a program co-chair of the ACM Intl. Workshop on Advances in GIS (1996), and a technical advisor to United Nations Development Program (UNDP), Environmental Systems Research Institute (ESRI), and other organizations. His research projects have been sponsored by the NSF, NASA, UDOD, USDOT, MN/DoT etc. He received a Ph.D. degree in Computer Science from the University of California (Berkeley, CA). More details are available from http://www.cs.umn.edu/~shekhar.