-
Clustering
Clustering is one of the fundamental tasks in unsupervised learning, but there are a huge diversity of approaches. There is k-means , mean-shift, hierarchical clustering, mixture models and more. The reason for so many varied approaches is that clustering by itself is ill-defined. In this post I will focus on two methods, mean-shift and Gaussian mixture modeling, because they have a more “statistical” flavor, in that they can be related to modeling a probability distribution over the data. Despite their similarities, they are based on sometimes contradictory principles.
1. Mean-shift
Mean-shift clusters points by the modes of a nonparametric kernel density. The kernel density estimator for data x1,…,xN is
ˆp(x)=1nhd∑Ni=1K(‖
where K is a kernel function. You can imagine the kernel to be some density with variance one, such as a Gaussian density, with variability controlled by the bandwidth h. In effect a kernel density is a smoothed histogram, a mixture of local densities at each datum. \hat{p} has some number of modes, let’s say p .
The gradient of the density at a point x looks like
\nabla \hat{p}(x) = L(x)\cdot\left( \frac{ \sum_{i=1}^N x_i k(\frac{\Vert x-x_i\Vert^2}{h})}{\sum_{i=1}^n k(\frac{\Vert x-x_i\Vert^2}{h})} - x \right),