Monday, January 12, 2009

[Research] Incremental Clustering of Time Series and Motif Discovery

Iterative Incremental Clustering of Time Series. Jessica Lin, Michail Vlachos, Eamonn J. Keogh, Dimitrios Gunopulos. Advances in Database Technology. 2004

In this paper, the authors present an interesting way to clustering time series. They mention some of the problems I had when dealing with clustering of human motion time series: low convergence, long times, variable clusters. As the authors said: Kmeans has a complexity of O(kNrD), k=number of clusters, N=number of abjetcs, r=number of iterations, and D=number of dimensions. However, all the algorithms (and even this paper) assume that time series have the same length. Reading this paper I understand that several comparisons of length-variable time series makes difficult the convergence and the obtaining of centroids. The approach introduced in this paper is based on the multiresolutions properties de Haar Wavelets and I think they can be more studied to preserve the convergence when clustering time series of different lengths.

Scaling and time warping in time series querying. Fu, A. W., Keogh, E., Lau, L. Y., and Ratanamahatana, C. A. 2005. In Proceedings of the 31st international Conference on Very Large Data Bases (Trondheim, Norway, August 30 - September 02, 2005). Very Large Data Bases. VLDB Endowment, 649-660.

This work studies the properties of DTW with respect to Euclidean distance in the context of time series queries. Since response time is the primary goal, the authors present some lower bound techniques (based on triangular inequality) such as: Sakoe-Chiba band and LB-Keaogh, to reduce the number of time the DTW distance is evaluared. The reduction in the dimensionality of the time series by component analysis is emphasized.

Detecting time series motifs under uniform scaling. Yankov, D., Keogh, E., Medina, J., Chiu, B., and Zordan, V. 2007. In Proceedings of the 13th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (San Jose, California, USA, August 12 - 15, 2007). KDD '07. ACM, New York, NY, 844-853. 

This paper was recently published and especifically deal with detecting time series motifs with the technique Uniform Scaling. The basic idea is to resample the time series both in amplitude and also in frequency to be able to compare time series of different lengths. The authors do not use Fourier techniques for that purpose but only a sampling of the time series upto obtain a desired length. In this paper they also used random projection to detec motifs. I noticed that the random projection approach is basically a clustering approach that organize range of time series in different buckets by using a hashing function that is based on the random election of elements in the time series segmenet. The main assumption uis that the time series are uniformly distributed and then a random distribution of elements in the time series is obtained. I want to know if this assumption is valid in case of human motion time series from video data since we noticed that we obtain sort of regular shapes over time, but not random time series.

Chiu, B., Keogh, E., and Lonardi, S. 2003. Probabilistic discovery of time series motifs. In Proceedings of the Ninth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (Washington, D.C., August 24 - 27, 2003). KDD '03. ACM, New York, NY, 493-498. 

In this paper the authors introduce the importance of a random distribution in time series. By assuming that, they use statistical parameters (mean, standart deviation, correlation) to interpret missing or noisy values in the time series. Random projection is also used here, I wonder if other algoritms can be used since this method only works well in [10-20] dimensions since above that interval, a random distribution is not assumed.

No comments: