Tags : divergence

Entries in this Tags : 10logs Showing : 1 - 10 / 10

Jan 25, 2013

Loosely bounding a Bregman divergence (without the Hessian)

Post @ 17:05:34 | Bregman divergence

Traditionally, the supremum of the norm of the Hessian of the Bregman generator over a closed domain is used in conjuction with the Lagrange remainder of the Taylor expansion to bound a Bregman divergence since we have:
BregmanRemainder.png

But we can also bound Bregman divergence easily without the Hessian, by using the gradient, as follows:

BregmanBoundGradient.png

I will report how to use this bound when defining and computing the Chernoff distance for exponential families.
Frank.

An update list of publications

Nov 26, 2012

Smooth transitions between left-sided and right-sided Bregman centroids (with java code):

One of the questions that often arises when using a Kullback-Leibler divergence is "how do we choose the orientation of KL?". In information retrieval, another related argument is "how to symmetrize Kullback-Leibler divergence?" (3 solutions: Jeffreys divergence, Jensen-Shannon divergence, Chernoff divergence). However, sometimes we would like to experiment more precise choices rather than left/right or symmetric orientation. One way to do so is to use skew Burbea-Rao divergences (also called skew Jensen divergences) that tend in limit cases to Bregman divergences:

skewBurbea-Rao-divergence.png
Skewness factor may depend on applications although a principle way is yet to be analyzed. This sample program Bregman_BurbeaRao_Jensen_SkewDivergence.java demonstrates the skewness property in limit cases. Note that you do not have to compute gradient for approximating Bregman divergences. Here is a run of the program:

Skew Burbea-Rao-Jensen divergences: Bregman divergences obtained in limit case alpha=0 and Bregman reverse for limit case alpha=1
alpha=1.0E-4    0.06831750610769681
alpha=0.5       0.06073382594234067
alpha=0.9999    0.05521937447971147
Bregman direct :                0.0683193293834281       alpha->0
Reverse Bregman:                0.05521841361948579      alpha->1
...

alpha=1.0E-4    0.06831750610769681
alpha=0.5       0.06073382594234067
alpha=0.9999    0.05521937447971147
Bregman direct :                0.0683193293834281       alpha->0
Reverse Bregman:                0.05521841361948579      alpha->1

More details can be found in 1004.5049v3.pdf and 1009.4004v2.pdf
Frank.

May 18, 2012

Squared Euclidean distance is not a metric

Post @ 13:43:54 | divergence

Jan 18, 2012

Total Bregman divergence and Soft Clustering

Post @ 10:28:27 | Bregman divergence

Since the work of Banerjee et al. that showed that the expectation-maximization of mixtures of exponential families is a Bregman soft clustering in disguise, there has been a strong interest in further using the bijection between exp fam and Bregman divergences.

In
Shape Retrieval Using Hierarchical Total Bregman Soft Clustering
similarly it is proven that for total Bregman divergences (tBD), there exists a distribution which belongs to the lifted exponential family of statistical distributions This leads to a new clustering technique namely, the total Bregman soft clustering algorithm.

See paper.
Frank.

Aug 23, 2011

Expected Kullback-Leibler divergence between a multinomial and an empirical distribution

Post @ 11:01:29 | Kullback-Leibler divergence

Thanks for your valuable feedback and references.
KLempiricalmultinomial.png
Frank.

Jun 09, 2010

An asymptotic property of centroids with respect to f-divergences

Post @ 4:39:49 | f-divergence

f-centroids can be defined as the unique minimizer of the average Csiszar/Ali-Silvey f-divergence. All f-centroids are homogeneous (scale free, homogeneous degree 1).

f-centroids include power means (also reachable as Bregman centroids), extreme means, Lehmer mean, Gini mean, etc

An interesting property is by shifting the origin and letting the origin tend to infinity, all f-means become arithmetic mean.

fmeanlimit.jpg
The proof requires f to be C3, three times continously differentiable.

Frank.

Jun 04, 2010

Joyce and danger of Internet

Post @ 17:48:15 | f-divergence

We all daily use Internet and are biased by its non-controlled source of information. I was looking for the full name of the Ali and Silvey f-divergence paper, also independently studied by Csiszar. In those days, it was not rare to only mention the last name and give only initials, so it is hard to know for sure what those capital letters mean in

Ali, S. M.; Silvey, S. D. (1966). "A general class of coefficients of divergence of one distribution from another". Journal of the Royal Statistical Society, Series B 28 (1): 131?140.

After using several search tools (google x, x in plain, books, scholar, etc.), and tracing back previous papers, eventually writing some emails to Aligarh Muslim University, I figured out the full names of the celebrated paper, and finally completed by bibtex entry as follows:

% Computational information geometry bibtex file
@Article{AliSilvey-1966,
  author =       {Syed Mumtaz Ali and Samuel David Silvey},
  title =        {A general class of coefficients of divergence of one distribution from another},
  journal =      {Journal of the Royal Statistical Society, Series B},
  year =         {1966},
  volume =   {28},
  pages =    {131--142}
}

S. M. Ali was a PhD student of S. D. Silvey.

On a different side of Internet, I searched for some Kullback-Leibler properties, and figured out a weird french article mentionning

Kilolitre de divergence et mise à jour bayésienne

In fact, it was the automated translation for

KL divergence and Bayesian updating

KL has been suggested by the automatic translator to be kiloliter... Then the translated article becomes really funny to read. Have a look http://www.worldlingo.com/ma/enwiki/fr/Kullback%E2%80%93Leibler_divergence (french) and http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence (english)

Frank.

Jan 07, 2010

Variational distance is the only metric f-divergence

Post @ 22:40:41 | f-divergence

A quite interesting property is that the variational divergence (a f-divergence) and its multiples are the only metric f-divergences. This was recently shown
Confliction of the Convexity and Metric Properties in f-Divergences (IEICE 2007).

If instead, we assume f strictly positive concave (and self-dual f(x)=x f(1/x)) then the formula of f-divergences yields metrics.

Frank.

Feb 27, 2009

A unique characterization of alpha-divergences

Post @ 15:11:45 | alpha-divergence

Professor Shun-ichi Amari recently gave a talk at Ecole Polytechnique on information geometry. The latter point of his talk characterizes interestingly the alpha-divergences on positive measures (non-normalized distributions) as the intersection of f-divergences and Bregman divergences.


http://videolectures.net/etvc08_paris/

Of course, on probability measures, only the Kullback-Leibler divergence lies in the common intersection (and its dual).

Feb 26, 2009

Information monotonicity

Post @ 18:10:05 | f-divergence

Csiszar f-divergences have the property of information monotonicity. So take a positive array p with n bins and partition it into m bins P with the probability of falling in a bin being the sum of the probability of the atoms of that bin. Then the f-divergence of (p,q) should always be greater or equal to the f-divergence of the corresponding partitionned arrays. In other words:

f-div(p,q) >= f-div(P,Q)

That means that we can only loose information by aggregating atoms. That is one fairly reasonnable behavior of information.

More axiomatic characterization is detailed in:

Csiszár, Imre. 2008. "Axiomatic Characterizations of Information Measures." Entropy 10, no. 3: 261-273.


Frank.