Tags : divergence
Entries in this Tags : 10logs Showing : 1 - 10 / 10
Jan 25, 2013
Loosely bounding a Bregman divergence (without the Hessian)
Nov 26, 2012
Smooth transitions between left-sided and right-sided Bregman centroids (with java code):
One of the questions that often arises when using a Kullback-Leibler divergence is "how do we choose the orientation of KL?".
In information retrieval, another related argument is "how to symmetrize Kullback-Leibler divergence?" (3 solutions: Jeffreys divergence, Jensen-Shannon divergence, Chernoff divergence).
However, sometimes we would like to experiment more precise choices rather than left/right or symmetric orientation.
One way to do so is to use skew Burbea-Rao divergences (also called skew Jensen divergences) that tend in limit cases to Bregman divergences:
Skewness factor may depend on applications although a principle way is yet to be analyzed.
This sample program Bregman_BurbeaRao_Jensen_SkewDivergence.java demonstrates the skewness property in limit cases.
Note that you do not have to compute gradient for approximating Bregman divergences.
Here is a run of the program:
Skew Burbea-Rao-Jensen divergences: Bregman divergences obtained in limit case alpha=0 and Bregman reverse for limit case alpha=1 alpha=1.0E-4 0.06831750610769681 alpha=0.5 0.06073382594234067 alpha=0.9999 0.05521937447971147 Bregman direct : 0.0683193293834281 alpha->0 Reverse Bregman: 0.05521841361948579 alpha->1 ... alpha=1.0E-4 0.06831750610769681 alpha=0.5 0.06073382594234067 alpha=0.9999 0.05521937447971147 Bregman direct : 0.0683193293834281 alpha->0 Reverse Bregman: 0.05521841361948579 alpha->1
More details can be found in 1004.5049v3.pdf and 1009.4004v2.pdf
Frank.
May 18, 2012
Jan 18, 2012
Total Bregman divergence and Soft Clustering
Since the work of Banerjee et al. that showed that the expectation-maximization of mixtures of exponential families is a Bregman soft clustering in disguise, there has been a strong interest in further using the bijection between exp fam and Bregman divergences.
In
Shape Retrieval Using Hierarchical Total
Bregman Soft Clustering
similarly it is proven that
for total Bregman divergences (tBD), there exists a distribution which belongs to the lifted exponential family of statistical distributions
This leads to a new clustering technique namely, the total Bregman soft clustering algorithm.
See paper.
Frank.
Aug 23, 2011
Expected Kullback-Leibler divergence between a multinomial and an empirical distribution
Thanks for your valuable feedback and references.
Frank.
Jun 09, 2010
An asymptotic property of centroids with respect to f-divergences
f-centroids can be defined as the unique minimizer of the average Csiszar/Ali-Silvey f-divergence. All f-centroids are homogeneous (scale free, homogeneous degree 1).
f-centroids include power means (also reachable as Bregman centroids), extreme means, Lehmer mean, Gini mean, etc
An interesting property is by shifting the origin and letting the origin tend to infinity, all f-means become arithmetic mean.
The proof requires f to be C3, three times continously differentiable.
Frank.
Jun 04, 2010
Joyce and danger of Internet
We all daily use Internet and are biased by its non-controlled source of information.
I was looking for the full name of the Ali and Silvey f-divergence paper, also independently studied by Csiszar.
In those days, it was not rare to only mention the last name and give only initials, so it is hard to know for sure what those capital letters mean in
Ali, S. M.; Silvey, S. D. (1966). "A general class of coefficients of divergence of one distribution from another". Journal of the Royal Statistical Society, Series B 28 (1): 131?140.
After using several search tools (google x, x in plain, books, scholar, etc.), and tracing back previous papers, eventually writing some emails to Aligarh Muslim University, I figured out the full names of the celebrated paper, and finally completed by bibtex entry as follows:
% Computational information geometry bibtex file
@Article{AliSilvey-1966,
author = {Syed Mumtaz Ali and Samuel David Silvey},
title = {A general class of coefficients of divergence of one distribution from another},
journal = {Journal of the Royal Statistical Society, Series B},
year = {1966},
volume = {28},
pages = {131--142}
}
S. M. Ali was a PhD student of S. D. Silvey.
On a different side of Internet, I searched for some Kullback-Leibler properties, and figured out a weird french article mentionning
Kilolitre de divergence et mise à jour bayésienne
In fact, it was the automated translation for
KL divergence and Bayesian updating
KL has been suggested by the automatic translator to be kiloliter... Then the translated article becomes really funny to read.
Have a look http://www.worldlingo.com/ma/enwiki/fr/Kullback%E2%80%93Leibler_divergence (french) and http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence (english)
Frank.
Jan 07, 2010
Variational distance is the only metric f-divergence
A quite interesting property is that the variational divergence (a f-divergence) and its multiples are the only metric f-divergences. This was recently shown
Confliction of the Convexity and Metric Properties in f-Divergences (IEICE 2007).
If instead, we assume f strictly positive concave (and self-dual f(x)=x f(1/x)) then the formula of f-divergences yields metrics.
Frank.
Feb 27, 2009
A unique characterization of alpha-divergences
Professor Shun-ichi Amari recently gave a talk at Ecole Polytechnique on information geometry. The latter point of his talk characterizes interestingly the alpha-divergences on positive measures (non-normalized distributions) as the intersection of f-divergences and Bregman divergences.
http://videolectures.net/etvc08_paris/
Of course, on probability measures, only the Kullback-Leibler divergence lies in the common intersection (and its dual).
Feb 26, 2009
Information monotonicity
Csiszar f-divergences have the property of information monotonicity. So take a positive array p with n bins and partition it into m bins P with the probability of falling in a bin being the sum of the probability of the atoms of that bin. Then the f-divergence of (p,q) should always be greater or equal to the f-divergence of the corresponding partitionned arrays. In other words:
f-div(p,q) >= f-div(P,Q)
That means that we can only loose information by aggregating atoms. That is one fairly reasonnable behavior of information.
More axiomatic characterization is detailed in:
Csiszár, Imre. 2008. "Axiomatic Characterizations of Information Measures." Entropy 10, no. 3: 261-273.
Frank.
Traditionally, the supremum of the norm of the Hessian of the Bregman generator over a closed domain is used in conjuction with the Lagrange remainder of the Taylor expansion to bound a Bregman divergence since we have:
But we can also bound Bregman divergence easily without the Hessian, by using the gradient, as follows:
I will report how to use this bound when defining and computing the Chernoff distance for exponential families.
Frank.
An update list of publications