May 18, 2012
Squared Euclidean distance is not a metric
May 17, 2012
Videos of the Leon Brillouin information geometry seminar
The Leon Brillouin seminar hosts invited lectures on topics concerning distances, information geometry, and their applications. It is held downtown Paris at IRCAM center close to the Pompidou center. Recently, we have uploaded the videos/slides of speakers.
For example, the video on
The Burbea-Rao and Bhattacharyya centroids
is available from the past events tab.
There is a much more to listen to.
Also, there is the international mailing list on information geometry.
Apr 27, 2012
A problem clearly stated is a problem...
"A problem clearly stated is a problem half solved." - Dorothea Brande (1893 - 1948), American Writer and Editor.
Thinking about this when writing papers...
Apr 26, 2012
Exponentially many extrema!
When defining a centroid with respect to a divergence as the minimizer of the average distortion, one asks whether the minimizer is unique or not:
For f-divergences, convexity in both arguments ensure that it is the case, and setting the gradient to zero does the job.
But when it is not convex, we may still hope (and prove) that the centroid is unique.
This was the case for the Burbea-Rao centroid (or Jensen centroid induced by the Jensen convexity gap of a function):
The Burbea-Rao and Bhattacharyya centroids using a fixed-point equation and property of interness of quasi-arithmetic means.
However, in general the centroid defined as a minimizer may not be unique if the average distortion has many local minima.
Such simple functions may indeed yield exponentially many maxima as shown by Auer and his colleagues:
Exponentially many local minima for single neurons
They proved that a single neuron with the logistic transfer function and square loss to optimze can yield exponential many extrema for learning the weight vector.
@inproceedings{ManyLocalMinima-1995,
author = {Peter Auer and Mark Herbster and Manfred K. Warmuth},
title = {Exponentially many local minima for single neurons},
booktitle = {Neural Information Processing Society (NIPS)},
year = {1995},
pages = {316-322}
}
Apr 11, 2012
The notion of sufficiency in statistics
In 1921 (published 1922), Sir R. A. Fisher came up with a (the?) mathematical theory of statistics.
At the core of it is the notion of sufficiency:
... the statistic chosen should summarize the whole of the relevant information supplied by the sample.
Nowadays, with Big Data hunt for Better Data, model selection and sufficiency is crucial.
The question remains on how much information is there in the data.
Frank.
Apr 10, 2012
Model centroids for the simplification of kernel density estimators
When learning statistical mixtures, there are two basic questions: (1) how good can we learn a model, and (2) how fast can we do it?
At ICASSP, we have presented k-MLE for learning mixtures with k components by maximizing the complete likelihood function.
Another strategy, is to start from a kernel density estimator, and then simplify it.
For 1D normal mixtures, the Fisher-Rao geometry amounts to hyperbolic geometry, but the centroid has not a closed-form.
We have proposed to use another definition of center of mass in closed form in hyperbolic geometry to simplify a KDE.
Thus we learn a mixture by simplifying a KDE.
The details of the paper are here.
Frank.
Apr 03, 2012
The Digital Chameleon Principle: Computing Invisibility by Rendering Transparency
In 2007, I was thinking of rendering transparency by building a prototype with a light field acquisition (camera array) piped to a light field renderer (projector array). I wrote a column in CGA.
How crazy or realistic is this idea? Well, I suggest you to look at that video for convincing you that it is not so SciFI:
Frank.
Mar 31, 2012
What is the best way to learn finite statistical mixture models?
Finite mixture models occur ubiquituously in many signal processing applications.
The standard base solution is to use the Expectation-Maximization algorithm (EM) that is proven to converge monotonically by improving the incomplete likelihood. One problem is proper initialization, another problem is the stopping criterion to force EM to stop. What if we rather maximizes the complete likelihood. Then for components of exponential families (like Gaussian mixture models), we can design a simple algorithm that performs a dual additive Bregman clustering for updating expectation parameters followed by a cross-entropy minimization for updating weights, and reiterate finitely until it reaches a local optimum: This is the essence of k-MLE. MLE stands for maximum likelihood estimation.
$k$-MLE: A fast algorithm for learning statistical mixture models
The two major questions:
- How good can you learn a mixture model?,
- and how fast can you do it?
Many interesting trade-offs to unravel!
Frank.
Feb 27, 2012
Feb 22, 2012
Beamer with dvips!
Well, I am preparing slides with latex Beamer and using Ipe figures. I like to include graphics with png and eps, and often discard .pdf figures. So I am rather dvips than pdflatex...To compile my Beamer slides using dvips, here is the trick (with color figures and proper sizes):
dvips -T 128mm,96mm -Ic slides.dvi
ps2pdf slides.ps
Frank.
Feb 09, 2012
How good Human is at distinguishing distributions?
Rayleigh mixture models (RMMs) are mixtures of exponential families (as Gaussian mixture models, GMMs). Here is a simple image where the foreground text has been synthesized with Rayleigh variates (lamda1) while the background is synthesized using Rayleigh variates (lamda2). I am curious to know how good human is at distinguishing distributions, compare to usual measures, say Kullback-Leibler divergence. Of course, it depends on the viewing distance/precision, otherwise we get expectation colors, and we have some work on this. Does stochasticity help in discriminating objects. I guess so.
Jan 31, 2012
Jan 26, 2012
Bibliography
Long time since I last updated my publication list. FN-Journal-Jan2012.pdf FN-Journal-Jan2012.bib
Jan 18, 2012
Total Bregman divergence and Soft Clustering
Since the work of Banerjee et al. that showed that the expectation-maximization of mixtures of exponential families is a Bregman soft clustering in disguise, there has been a strong interest in further using the bijection between exp fam and Bregman divergences.
In
Shape Retrieval Using Hierarchical Total
Bregman Soft Clustering
similarly it is proven that
for total Bregman divergences (tBD), there exists a distribution which belongs to the lifted exponential family of statistical distributions
This leads to a new clustering technique namely, the total Bregman soft clustering algorithm.
See paper.
Frank.
Jan 17, 2012
Paper aggregators
List of papers and citations automatically aggregated...
sci.ans
Frank.
Jan 11, 2012
Legendre transformation and information geometry
Legendre-Fenchel duality is at the heart of dually flat spaces in information geometry. Convex functions come in pairs, called convex conjugates. The basic principle is that if you plot the epigraph of the function F and reinterpret it at the intersection of support halfspaces, you get the dual geometric representation of the epigraph. You can parameterize this dual representation using the convex conjugate function. Thus the Legendre-Fenchel transformation is sometimes called the slope transform.
More details in the memo:
NoteLegendreTransformation.pdf
++x-x,
Frank.
Dec 29, 2011
The 1-Center in the hyperbolic Klein disk
I have implemented a visual interface that generalizes de Badoiu-Clarkson algorithm to arbitrary Riemmanian geometry
(see On Approximating the Riemannian 1-Center).
The javascript demo that should run on any platform is available here.
Frank.
Dec 28, 2011
Some applications of computational information geometry
Please send me your favorite application of information geometry.
Happy new year to all of you. Frank
Digital cameras are quickly merging with smart phones, and visual computing applications [1] that support computational photography and augmented reality applications are flourishing at a fast pace. By 2013, the annual worldwide IP traffic is predicted to be a zettabyte: 90% of consumer IP traffic and 60% of mobile IP traffic will be video.
How do we extract and use rich information from those massive data sets? As visual data abound, computer vision and computer graphics are increasingly relying on machine learning and information-theoretic methods. Computational Information Geometry is a novel paradigm to perform high-fidelity data analysis using the language and thinking of geometry.
Geometry allows us to map the data in space for efficient processing and retrieval of intrinsic information. Geometry is in essence coordinate-free and allows one to extract the very information from data.
Geometrization of statistics has provided novel algorithms for manipulating statistical mixture models such as Gaussian mixture models [2] that are commonly used in image processing: An image pixel at position (x,y) with color attributes (red, green, blue) is embedded into a 5D space so that a 2D color image is interpreted as a 5D spatial point cloud. We then seek for a compact generative statistical representation of the image point set. Such statistical methods are useful for explaning human cognitive and learning skills [3], and analyzing emerging phenomena of complex systems using hierarchical Bayesian models.
Geometry is well alive and continue to play a crucial role in natural sciences. For example, the propagation of seismic waves from an epicenter follows Fermat's principle of shortest paths (minimum arrival time). Since the Earth is made of anisotropic media such as the peridotite, shortest paths are not line segments: The geometry is not Euclidean. Seismic wave propagation is currently best modeled using Finsler geometry that extends Riemmanian geometry by taking into account the anisotropic direction of materials. In [4], we recently show how to aggregate and cluster information in such Finslerian spaces. Finsler geometry is also considered for advanced medical imaging of DT-MRI data-sets.
Last but not least, the theory of portfolio allocation has been traditionally carried out using the mean-variance method of Markowitz. Considering universal statistical distributions (exponential families) allows one to bypass the Gaussian assumption, and to derive the exact expression of the risk premium (a Bregman divergence) and certainty equivalent [5]. Moreover, we design an on-line learning algorithm with guaranteed lower bounds on its cumulated certainty equivalents [5]. It is interesting to note that portfolio theory has also been considered to explain robustness trade-offs of cells in biology [6] (bioeconomics).
REFERENCES:
- [1] Frank Nielsen: Visual Computing: Geometry, Graphics, and Vision; Charles River Media, ISBN: 1-58450-427-7, 2005.
- [2] Frank Nielsen, Sylvain Boltz: The Burbea-Rao and Bhattacharyya Centroids. IEEE Transactions on Information Theory 57(8): 5455-5466, 2011.
-
[3] Joshua B. Tenenbaum, Charles Kemp, Thomas L. Griffiths, and Noah D.
Goodman: How to Grow a Mind: Statistics, Structure, and Abstraction
Science 331(6022):1279-1285, 2011.
-
[4] Marc Arnaudon, Frank Nielsen: Medians and means in Finsler geometry,
Cambridge LMS Journal of Computation and Mathematics, 2011.
- [5] Richard Nock, Brice Magdalou, Eric Briys and Frank Nielsen: On tracking portfolios with certainty equivalents on a generalization of Markowitz model: the Fool, the Wise and the Adaptive International Conference on Machine Learning, pp. 73-80, 2011.
-
[6] Hiroaki Kitano: Violations of robustness trade-offs Mol Syst Biol. 2010; 6: 384. 10.1038/msb.2010.40
Dec 20, 2011
A closed-form expression for the Sharma-Mittal entropy of exponential families
The Sharma-Mittal entropies generalize the celebrated Shannon, Renyi and Tsallis entropies. We report a closed-form formula for the Sharma?Mittal entropies and relative entropies for arbitrary exponential family distributions. We explicitly instantiate the formula for the case of the multivariate Gaussian distributions and discuss its estimation.
@article{1751-8121-45-3-032003,
author={Frank Nielsen and Richard Nock},
title={A closed-form expression for the Sharma-Mittal entropy of exponential families},
journal={Journal of Physics A: Mathematical and Theoretical},
volume={45},
number={3},
pages={032003},
url={http://stacks.iop.org/1751-8121/45/i=3/a=032003},
year={2012}
}
Nov 18, 2011
Skew Jensen-Bregman Voronoi Diagrams
The article
Skew Jensen-Bregman Voronoi Diagrams
is out here
"
Abstract A Jensen-Bregman divergence is a distortion measure defined by a Jensen convexity gap induced by a strictly convex functional generator. Jensen-Bregman divergences unify the squared Euclidean and Mahalanobis distances with the celebrated information-theoretic Jensen-Shannon divergence, and can further be skewed to include Bregman divergences in limit cases. We study the geometric properties and combinatorial complexities of both the Voronoi diagrams and the centroidal Voronoi diagrams induced by such as class of divergences. We show that Jensen-Bregman divergences occur in two contexts: (1) when symmetrizing Bregman divergences, and (2) when computing the Bhattacharyya distances of statistical distributions. Since the Bhattacharyya distance of popular parametric exponential family distributions in statistics can be computed equivalently as Jensen-Bregman divergences, these skew Jensen-Bregman Voronoi diagrams allow one to define a novel family of statistical Voronoi diagrams.
Keywords Jensen?s inequality ? Bregman divergences ? Jensen-Shannon divergence ? Jensen-von Neumann divergence ? Bhattacharyya distance ? information geometry
Oct 31, 2011
Darpa Challenge
DARPA is running a competition for deciphering shredded documents. They have 5 data-sets and questions associated to the image contents. Worth looking at!
Here is a toy reconstruction (that I made by hand in a few minutes)
Best, Frank.
Oct 25, 2011
Dictionary of computational information geometry...
updated with 410+ terms now.
Sep 20, 2011
Aug 26, 2011
MLE of exponential families as a minimizer of the average right-sided dual Bregman divergence
Frank.
Aug 23, 2011
Expected Kullback-Leibler divergence between a multinomial and an empirical distribution
Thanks for your valuable feedback and references.
Frank.
Aug 10, 2011
Aug 08, 2011
The Burbea-Rao and Bhattacharyya Centroids
We study the centroid with respect to the class of information-theoretic Burbea-Rao divergences that generalize the celebrated Jensen-Shannon divergence by measuring the non-negative Jensen difference induced by a strictly convex and differentiable function. Although those Burbea-Rao divergences are symmetric by construction, they are not metric since they fail to satisfy the triangle inequality. We first explain how a particular symmetrization of Bregman divergences called Jensen-Bregman distances yields exactly those Burbea-Rao divergences. We then proceed by defining skew Burbea-Rao divergences, and show that skew Burbea-Rao divergences amount in limit cases to compute Bregman divergences. We then prove that Burbea-Rao centroids can be arbitrarily finely approximated by a generic iterative concave-convex optimization algorithm with guaranteed convergence property. In the second part of the paper, we consider the Bhattacharyya distance that is commonly used to measure overlapping degree of probability distributions. We show that Bhattacharyya distances on members of the same statistical exponential family amount to calculate a Burbea-Rao divergence in disguise. Thus we get an efficient algorithm for computing the Bhattacharyya centroid of a set of parametric distributions belonging to the same exponential families, improving over former specialized methods found in the literature that were limited to univariate or ?diagonal? multivariate Gaussians. To illustrate the performance of our Bhattacharyya/Burbea-Rao centroid algorithm, we present experimental performance results for $k$-means and hierarchical clustering methods of Gaussian mixture models.
Jul 20, 2011
Distance notations
I am looking for Finslerian quasi-metric distances with applications in computer vision/medical imaging. Any recommendation?
Comments welcome!
Frank.
Jul 14, 2011
Clickremoval
You can try the applet with your own pictures here:.
Details in the paper:
Nielsen, F., Nock, R., 2005. ClickRemoval: Interactive pinpoint image object removal, ACM multimedia, Hilton, Singapore, pp. 315?318.
Jul 12, 2011
Jul 11, 2011
A taste of ICCV
The list of accepted work at International Conference on Computer Vision(ICCV) is available for a couple of days. As usual, there are many exciting titles. I made a rough selection considering my interests.
Here it is:
- A Nonparametric Riemannian Framework on Tensor Field with Apllication to Foreground Segmentation
- A New Distance for Scale-Invariant 3D Shape Recognition and Registration
- Learning Nonlinear Distance Functions using Neural Network for Regression with Application to Robust Human Age Estimation
- Fisher Discrimination Dictionary Learning for Sparse Representation
- Means in spaces of tree-like shapes
- Learning Mixtures of Sparse Distance Metrics for Classification and Dimensionality Reduction
- Positive Definite Dictionary Learning for Region Covariances
- StereoCut: Consistent Interactive Object Selection in Stereo Image Pairs
- Panoramic Stereo Video Textures
- Fisher vectors to model spatial layout for image categorization
- Unsupervised Metric Learning for Face Identification in TV Video
- Complementary Hashing for Approximate Nearest Neighbor Search
- A Dimensionality Result for Multiple Homography Matrices
- Efficient Similarity Search for Covariance Matrices via the Jensen-Bregman LogDet Divergence
Frank.
@FrnkNlsn