Since I am taking Applied Spatial Statistics, I am starting to consider a statistical approach to model the presence of events in video. Rather than describing the method of Laptev, this approach will be a mechanisim to formally discuss in the paper the existence of gradients in human movement.
Gradients will be considered as random variables that could appear in any pixel of the video. I plan to model those events as a probability function with the goal of finding the maximum likelihood estimates. I am reading the Expectation-maximization algorithm and the book "Interactive Spatial Data Analysis" to classify types of movement based on probability distribution of events.
The idea of using a distribution of probabilities in videos can also be found in [1]. The authors use this approach to synchronize video recordings of the same scene, but with different viewpoints. A weak point in this approach is that the authors only compare 2 distributions (histograms) by directly subtracting each position in the histograms.
I have also defined the "conflictive" spatiotemporal gradients as the ones that fall in the [middle line +- 2*spatial variance]. These events are removed and I am generating them for all the videos.
[1] J. Yan, M. Pollefeys, Video Synchronization via Space-Time Interest Point Distribution, Advanced Concepts for Intelligent Vision Systems, 2004.