![]() |
![]() |
Graham Cormode
Bell Laboratories
Friday, Nov. 5, 11:00am
LC 102, Brooklyn Campus, Polytechnic University
Many fundamental data mining problems gain additional complexity when applied to massive amounts data that is being generated at high speed. Networks are a driving force in massive data analysis, and require new approaches to produce the analyses network managers require. This motivates the data stream approach: using a small amount of space to process data very quickly in one pass. This has resulted in many strong and deep algorithmic results in this model of computation. But there is frequently a gap between theory and practice, since existing algorithms are too slow for typical high data rates by several orders of magnitude.
I will describe some recent work aimed at bridging the gap. Firstly, I will describe the Count-Min sketch, which is a simple and hence fast data structure that can be used to approximate entries of a high dimensional vector in very small space. It can be used as a "black box" to solve several problems, including finding frequent items, and quantiles of the data distribution, and gives provable guarantees about their results. Secondly, I will describe and analyze a solution to the change detection problem, where the aim is to identify items (eg IP addresses) whose behavior has changed between two periods. Both these methods have been implemented and are running on network data at network line speeds with high accuracy.
Joint work with S. Muthukrishnan
Bio:
Graham Cormode is a researcher in Network Management at Lucent Bell Laboratories. He received his bachelors degree in Computer Science from the University of Cambridge in 1998 and his PhD, titled "Sequence Distance Embeddings", from the University of Warwick in 2002. His postdoctoral studies were at DIMACS, Rutgers. He has worked on small space algorithms for massive data, sequence comparison and compression, and efficient data mining and analysis techniques.
For further information please contact Torsten Suel