Tags : Kullback-Leibler

Entries in this Tags : 13logs Showing : 1 - 13 / 13

Apr 26, 2013

Symmetrical Kullback-Leibler centroid and W-Lambert function

Post @ 10:17:31 | Kullback-Leibler

The Jeffreys divergence of J-divergence (or symmetrical Kullback-Leibler divergence) is defined by:
jdiv.png

The Jeffreys centroid for positive histograms is available in closed-form:
jcentroid-theo1.png

The Jeffreys frequency centroid is restricted to belong to the probability simplex:
jfcentroid.png


jtheo2.png

Here are the slides related to the Jeffreys centroids: Slides

Feb 25, 2013

Kullback-Leibler projection for best mixture simplification

Post @ 17:05:16 | Kullback-Leibler

Suppose we are given a mixture of n gaussians (p tilde) and we ask for the best gaussian (single component) approximating that mixture with respect to the Kullback-Leibler divergence:
centroid-mixturesimplification.png
Then this best Gaussian is found as the KL projection of the gaussian mixture onto the exponential family manifold of gaussians. It is equivalent to compute a Bregman barycenter.
KLprojection.png
Thus we can also learn a mixture by simplifying a kernel density estimator as explained in this paper and illustrated in this figure:
SimplifyKDE.png

Feb 08, 2013

Which side of Kullback-Leibler divergence to optimize?

Post @ 16:07:21 | Kullback-Leibler

A common question is which side of Kullback-Leibler (KL) should I consider for optimization. Since KL is the relative entropy: it is the difference of cross-entropy minus the entropy. As such the ideal model should be on right hand side, and the estimated model on left-hand side. Now suppose you want to find the center of two distributions for mixture simplification. You can interpret the results of left-sided and right-sided KL centroids as follows:

  • The Kullback-Leibler right-sided centroid is zero-avoiding so that its corresponding density function tries to cover the support of all input normals,
  • The Kullback-Leibler left-sided centroid is zero-forcing so that it focuses on the highest mass mode normal.

Here, there is not one model but two ideal models so we need a trade-off (=centroid). The zero-forcing/avoiding property is depicted in the following figure:
KLsidedandsymmetrized.png
Here are the details (paper).

Dec 09, 2009

Checking the information monotonicity of the Kullback-Leibler divergence

Post @ 22:03:02 | Kullback-Leibler

I wrote a small program to illustrate the information monotonicity property of f-divergences (including Kullback-Leibler). Here is a numerical example for histograms of 8 bins that we reduce to histograms in 4 and 2 bins. The KL measure is less at coarser resolution than higher resolution.

Check the information monotonicity of Kullback-Leibler divergence:
by merging bins into a coarser histogram, the Kullback-Leibler divergence is less than the higher resolution:
0.08522624581487719 0.1320022228947157 0.13591019441965485 0.06674528382980667 0.05029196864402132 0.19946790829780184 0.20516964900877682 0.1251865270903457
0.02648151081895093 0.17500830227728026 0.2779122146863664 0.06203415984687348 0.2535969883830692 0.04424529112439828 0.1320533102395284 0.028668222623533027
                KL(p,q)=0.46400208957858724     KL(q,p)=0.4558758820302893
4 bins  KL(p,q)=0.10555546240269079     KL(q,p)=0.09733318810652078
2 bins  KL(p,q)=0.029647768170771346    KL(q,p)=0.029836929625659273
Information monotonicity holds only for Csiszar's f-divergences

Aug 07, 2008

3D Bregman balls printed

More than a year ago, I printed 3D Bregman balls using a lithography process. I took may hours to print these shapes...

Generalized Kullback-Leibler (positive measures):
kl.jpg

Itakura-Saito (Burg entropy):
is.jpg

Logistic loss:
ll.jpg
To create the STL files, you need to discretize the surface of theses balls by walking on their geodesics passing through the centers. A bisection search does this to stop a more or less the radius.

Simplifying mixtures of Gaussians (MoG, GMM)

Since most pdfs can be equivalently expressed as a Gaussian mixture model, it is tempting to simplify them. The measure taken between two GMMs (one with many components, and the other with a reduced number) is the Kullback-Leibler relative entropy (=cross entropy-entropy). The problem is to decide whether we should minimize KL(GMMsource||GMM simplified) or the converse KL(GMM simplified||GMMsource) or the symmetrized KL (Jensen-Shannon, in green). Here are a few experiments carried out when simplidying a GMM to a single normal:
resultSylvain.png
result07-08-2008-05-09-23.png
result07-08-2008-10-00-14.png I let you guess whether the blue/red is the left/right or the converse... -:)

Aug 06, 2008

Relative entropy 3D ball

Post @ 11:49:03 | KullKullback-Leibler

RelativeEntropyBall.jpg Kullback-Leibler ball for unnormalized positive distributions.

Nov 14, 2007

Left and right centroid of bivariate normals

Post @ 18:40:53 | Kullback-Leibler

I am programming kmeans for multivariate normals with respect to the relative entropy distance . Here is a snapshot for the left (blue) and right (red) entropic centroid of 2 bivariate normals

klcentroids.png

Tomorrow, I'll attend the COE workshop on math aspects of image processing and computer vision. I'm looking forward to hearing the talks.

Nov 05, 2007

Clustering multivariate normals wrt. to KL

Post @ 16:40:22 | Kullback-Leibler

Today, I would like to introduce the NIPS 2006 paper:

Differential Entropic Clustering of Multivariate Gaussians.

The paper shows how to clustering a set of multivariate normals with respect to the Kullback-Leibler divergence. The key ingredient that makes it a non-trivial application of the former clustering with Bregman divergence paper (JMLR'05) is to show how to compute the centroid of a set of multivariate normals by decomposing the Bregman divergence into a convex sum of two Bregman divergences: a Mahalanobis distance on the means followed by a Burg matrix divergence on the variance-covariance matrix.

The authors also report on experimental results by using the normalized mutual information.

Nov 01, 2007

Exponential information-theoretic center

Post @ 16:07:09 | Kullback-Leibler

Prof. Vemuri recently sent me his paper:

Efficient Shape Indexing Using an Information Theoretic Representation

Spellman and Vemuri presented a left-side KL minimax center for probability density functions belonging to the exponential families (using log-partition statistical physics notations).

They prove that the minimizer is unique and defines the e-radius


Finally, they show that the center allows more efficiency for nearest neighbor queries. There is definitively "something" information geometry to be done on barycenters and centroids, like Bruno Pelletier first initiated in his 2005 paper.

Oct 30, 2007

Centroid of multivariate normals

Post @ 12:31:17 | Kullback-Leibler

Centroids are just defined the minimum distance average. For example, the center of mass is the minimizer of the squared Euclidean distance, etc. So what about multivariate normal distributions under the Kullback-Leibler distortion measure?

This is precisely what this paper investigated for the symmetrized KL. They could not come up with a closed form solution, and gave a rather complex algorithm based on solving Ricatti matrix equations. Nevertheless, centroid of normals have numerous applications, let it be in sound or image processing

On divergence based clustering of normal distributions and its application to normal distributions. Proceedings EUROSPEECH2003)

See also www.cs.utexas.edu/users/jdavis/papers/jdavis_gauss_cluster.pdf

Recently, I designed and characterized geometrically the sided and symmetrized centroids for arbitary Bregman divergences. Stay tune...

Oct 29, 2007

Kullback-Leibler divergence betweens two GMMs

Post @ 10:14:33 | Kullback-Leibler

Kullback-Leibler is the most fundamenta measure in information theory. It is difficult to compute analytically closed-form solutions for mixture models, like the Gaussian mixture models (GMM or MoG-mixture of Gaussians).

The paper

An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures

presents an approximation of the KL of two GMMs (obtained from soft clustering in image), and give experiments showing that the measure behaves better than the Mahalanobis metric.

Eurospeech'05 further extends the approach by presenting the unscented transform that is an intelligent Monte-Carlo method for the 2d extreme points found on the axes of the ellipsoids.

Oct 17, 2007

The language of 3D shapes and shape topics

Post @ 10:51:35 | Kullback-Leibler

3D partial shape retrieval is becoming more and more important for letting novice users sketching or having professionals selecting and copying/pasting 3D object parts. I'd like to introduce a very well written and simple paper that takes the original approach of text analysis to the 3D shape problem:
Shape Topics: A Compact Representation and New Algorithms for 3D Partial Shape Retrieval (CVPR'06)>
http://www.citeulike.org/user/ciga/article/1777250

First, let me mention that one of the successful 3d search engine is from Princeton university:
http://shape.cs.princeton.edu/search.html
They also provide a benchmark for evaluating features and search accuracy.

The shape topic paper takes the following stance: extract translation/rotation invariant features using the image spin ``trick'', cluster these spin images into clusters and process them efficiently for nearest-neighbor queries. Each spin image can be interpreted as a word (of bag-of-words if clusterised), and a 3D shape as an unstructured word document. Therefore, one can potentially use the vector space model (with weighted cosine transform) or L2 to evaluate the search method. However, this is not best since that we want an asymmetric distance. That is, we want that the score of a part in the model is max, while the score of a model in a part is less. That is a kind of containment assymetric distance that is fulfilled by the Kullback-Leibler (relative entropy) distance. The authors implemented the KL distance and showed experimentally that it yields better results thant L2 or cosine symmetrice distances.

<

p> But that it not all! They propose to use the latent Dirichlet method for modelling documents, to cluster the spin images into a small set of topics called the "shape topics" (in their exp., they use only 40 such kind of semantic clustering). Then a shape (document) is sampled as follows: choose its length using a Poisson distribution and a topic length using a Dirichlet distribution. Then get the set of topics using a multinomial distribution, and for each topic a set of words falling within the scope of a topic using another multinomial distribution.

This drastically reduce the 3D shape representation and yet does not deteriorate the performance of the system. This Dirichlet modeling seems confusing at first, and you'll find all gritty-nitty details in M. Jordan ML's expert team. For example,
www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf


Wikipedia has an entry too: http://en.wikipedia.org/wiki/Latent_Dirichlet_Allocation

In this sceenshot below, the green-windowed hand is a query, the blue windows match are the best matches falling within the same shape topic. (The scores are ordered from left to right and top to bottom).

shapetopic.jpg