Polytechnic University
home info people teaching research links

Mining Deviants In A Times Series Database

Nick Koudas

AT&T Shannon Laboratory

Monday November 15th, 1999, 1:00pm-2:00pm
Room LC102, Brooklyn Campus


Identifying outliers is an important data analysis function. Statisticians have long studied techniques to identify outliers in a data set in the context of fitting the data to some model. In the case of time series d ata, the situation is more murky. For instance, the "typical" value could "drift" up or down over time, so the extrema may not necessarily be interesting. We wish to identify data points that are somehow anomalous or "surprising".

We formally define the notion of a deviant in a time series, based on a representation sparsity metric. We develop an efficient algorithm to identify deviants in a time s eries. We demonstrate how this technique can be used to locate interesting artifacts in time series data, and present experimental evidence of the value of our technique.

As a side benefit, our algorithms are able to produce histogram representations of data, that have substantially lower error than "optimal histograms" ; for the same total storage, including both histogram buckets and the deviants stored separately. This is of independent interest for selectivity estimation.

This talk will also review state of the art histogramming techniques and present a couple of new results on histograms.

This is joint work with H. V Jagadish and S. Muthukrishnan

For further information, please contact Alex Delis