Since Lloyd's k-means iterative algorithm for hard clustering that stepwisely assign points to their nearest cluster and relocate these centers as centroids, there have been many generalizations, including the Bregman k-means.
This k-means is a batched k-means as points are assigned to their nearest center in a single stage.
The optimization is monotone but only converge to a local minimum.
To get off this potential local minimum, we can use a single swap that tries to move a point from a cluster to another one.
Of course, we better choose the point and target cluster as the one that best decreases the loss function.
This incremental k-means is known and explained as the first variation by Teboulle et al.:
Since Lloyd's k-means iterative algorithm for hard clustering that stepwisely assign points to their nearest cluster and relocate these centers as centroids, there have been many generalizations, including the Bregman k-means.
This k-means is a batched k-means as points are assigned to their nearest center in a single stage. The optimization is monotone but only converge to a local minimum. To get off this potential local minimum, we can use a single swap that tries to move a point from a cluster to another one. Of course, we better choose the point and target cluster as the one that best decreases the loss function.
This incremental k-means is known and explained as the first variation by Teboulle et al.:
paper
I invite you to look at the minimization of the loss function on a gene array expression dataset by combining batched and online k-means.
Frank.