Jan 31, 2012

Log-normalizer of an exponential family is convex

Post @ 16:08:15 | exponential families

lognormalizercvs.png

Jan 26, 2012

Bibliography

Post @ 16:22:34 | BibTeX

Long time since I last updated my publication list. FN-Journal-Jan2012.pdf FN-Journal-Jan2012.bib

Jan 18, 2012

Total Bregman divergence and Soft Clustering

Post @ 10:28:27 | Bregman divergence

Since the work of Banerjee et al. that showed that the expectation-maximization of mixtures of exponential families is a Bregman soft clustering in disguise, there has been a strong interest in further using the bijection between exp fam and Bregman divergences.

In
Shape Retrieval Using Hierarchical Total Bregman Soft Clustering
similarly it is proven that for total Bregman divergences (tBD), there exists a distribution which belongs to the lifted exponential family of statistical distributions This leads to a new clustering technique namely, the total Bregman soft clustering algorithm.

See paper.
Frank.

Jan 17, 2012

Paper aggregators

Post @ 15:03:16 | web tools

List of papers and citations automatically aggregated... (Smile)

sci.ans
Frank.

Jan 11, 2012

Legendre transformation and information geometry

Post @ 13:54:31 | Legendre-Fenchel duality

Legendre-Fenchel duality is at the heart of dually flat spaces in information geometry. Convex functions come in pairs, called convex conjugates. The basic principle is that if you plot the epigraph of the function F and reinterpret it at the intersection of support halfspaces, you get the dual geometric representation of the epigraph. You can parameterize this dual representation using the convex conjugate function. Thus the Legendre-Fenchel transformation is sometimes called the slope transform.

More details in the memo:
NoteLegendreTransformation.pdf

++x-x,
Frank.

Dec 29, 2011

The 1-Center in the hyperbolic Klein disk

Post @ 13:48:18 | demo

I have implemented a visual interface that generalizes de Badoiu-Clarkson algorithm to arbitrary Riemmanian geometry (see On Approximating the Riemannian 1-Center). The javascript demo that should run on any platform is available here.

Frank.

Dec 28, 2011

Some applications of computational information geometry

Post @ 15:02:23 | applications

Please send me your favorite application of information geometry. Happy new year to all of you. Frank

Digital cameras are quickly merging with smart phones, and visual computing applications [1] that support computational photography and augmented reality applications are flourishing at a fast pace. By 2013, the annual worldwide IP traffic is predicted to be a zettabyte: 90% of consumer IP traffic and 60% of mobile IP traffic will be video.

How do we extract and use rich information from those massive data sets? As visual data abound, computer vision and computer graphics are increasingly relying on machine learning and information-theoretic methods. Computational Information Geometry is a novel paradigm to perform high-fidelity data analysis using the language and thinking of geometry.

Geometry allows us to map the data in space for efficient processing and retrieval of intrinsic information. Geometry is in essence coordinate-free and allows one to extract the very information from data.

Geometrization of statistics has provided novel algorithms for manipulating statistical mixture models such as Gaussian mixture models [2] that are commonly used in image processing: An image pixel at position (x,y) with color attributes (red, green, blue) is embedded into a 5D space so that a 2D color image is interpreted as a 5D spatial point cloud. We then seek for a compact generative statistical representation of the image point set. Such statistical methods are useful for explaning human cognitive and learning skills [3], and analyzing emerging phenomena of complex systems using hierarchical Bayesian models.

Geometry is well alive and continue to play a crucial role in natural sciences. For example, the propagation of seismic waves from an epicenter follows Fermat's principle of shortest paths (minimum arrival time). Since the Earth is made of anisotropic media such as the peridotite, shortest paths are not line segments: The geometry is not Euclidean. Seismic wave propagation is currently best modeled using Finsler geometry that extends Riemmanian geometry by taking into account the anisotropic direction of materials. In [4], we recently show how to aggregate and cluster information in such Finslerian spaces. Finsler geometry is also considered for advanced medical imaging of DT-MRI data-sets.

Last but not least, the theory of portfolio allocation has been traditionally carried out using the mean-variance method of Markowitz. Considering universal statistical distributions (exponential families) allows one to bypass the Gaussian assumption, and to derive the exact expression of the risk premium (a Bregman divergence) and certainty equivalent [5]. Moreover, we design an on-line learning algorithm with guaranteed lower bounds on its cumulated certainty equivalents [5]. It is interesting to note that portfolio theory has also been considered to explain robustness trade-offs of cells in biology [6] (bioeconomics).


REFERENCES:

  • [1] Frank Nielsen: Visual Computing: Geometry, Graphics, and Vision; Charles River Media, ISBN: 1-58450-427-7, 2005.
  • [2] Frank Nielsen, Sylvain Boltz: The Burbea-Rao and Bhattacharyya Centroids. IEEE Transactions on Information Theory 57(8): 5455-5466, 2011.
  • [3] Joshua B. Tenenbaum, Charles Kemp, Thomas L. Griffiths, and Noah D. Goodman: How to Grow a Mind: Statistics, Structure, and Abstraction Science 331(6022):1279-1285, 2011.

  • [4] Marc Arnaudon, Frank Nielsen: Medians and means in Finsler geometry, Cambridge LMS Journal of Computation and Mathematics, 2011.

  • [5] Richard Nock, Brice Magdalou, Eric Briys and Frank Nielsen: On tracking portfolios with certainty equivalents on a generalization of Markowitz model: the Fool, the Wise and the Adaptive International Conference on Machine Learning, pp. 73-80, 2011.
  • [6] Hiroaki Kitano: Violations of robustness trade-offs Mol Syst Biol. 2010; 6: 384. 10.1038/msb.2010.40

Dec 20, 2011

A closed-form expression for the Sharma-Mittal entropy of exponential families

Post @ 11:52:26 | Sharma-Mittal entropy

The Sharma-Mittal entropies generalize the celebrated Shannon, Renyi and Tsallis entropies. We report a closed-form formula for the Sharma?Mittal entropies and relative entropies for arbitrary exponential family distributions. We explicitly instantiate the formula for the case of the multivariate Gaussian distributions and discuss its estimation.

paper.

   
@article{1751-8121-45-3-032003,
  author={Frank Nielsen and Richard Nock},
  title={A closed-form expression for the Sharma-Mittal entropy of exponential families},
  journal={Journal of Physics A: Mathematical and Theoretical},
  volume={45},
  number={3},
  pages={032003},
  url={http://stacks.iop.org/1751-8121/45/i=3/a=032003},
  year={2012}
}

Nov 18, 2011

Skew Jensen-Bregman Voronoi Diagrams

Post @ 14:25:29 | Voronoi diagrams

The article
Skew Jensen-Bregman Voronoi Diagrams
is out here
SkewJensenCoverpage.jpg"

Abstract A Jensen-Bregman divergence is a distortion measure defined by a Jensen convexity gap induced by a strictly convex functional generator. Jensen-Bregman divergences unify the squared Euclidean and Mahalanobis distances with the celebrated information-theoretic Jensen-Shannon divergence, and can further be skewed to include Bregman divergences in limit cases. We study the geometric properties and combinatorial complexities of both the Voronoi diagrams and the centroidal Voronoi diagrams induced by such as class of divergences. We show that Jensen-Bregman divergences occur in two contexts: (1) when symmetrizing Bregman divergences, and (2) when computing the Bhattacharyya distances of statistical distributions. Since the Bhattacharyya distance of popular parametric exponential family distributions in statistics can be computed equivalently as Jensen-Bregman divergences, these skew Jensen-Bregman Voronoi diagrams allow one to define a novel family of statistical Voronoi diagrams.

Keywords Jensen?s inequality ? Bregman divergences ? Jensen-Shannon divergence ? Jensen-von Neumann divergence ? Bhattacharyya distance ? information geometry

Oct 31, 2011

Darpa Challenge

Post @ 17:58:04 | puzzleAnnouncement

DARPA is running a competition for deciphering shredded documents. They have 5 data-sets and questions associated to the image contents. Worth looking at!


Here is a toy reconstruction (that I made by hand in a few minutes)

DarpaChallenge.png


Best, Frank.

Sep 20, 2011

Invariance, total variation, and information geometry

Post @ 14:26:33 | invariance

TVinvariance.png
Frank.

Aug 26, 2011

MLE of exponential families as a minimizer of the average right-sided dual Bregman divergence

Post @ 11:23:11 | exponential families

MLE-BregmanMedian.png
Frank.

Aug 23, 2011

Expected Kullback-Leibler divergence between a multinomial and an empirical distribution

Post @ 11:01:29 | Kullback-Leibler divergence

Thanks for your valuable feedback and references.
KLempiricalmultinomial.png
Frank.

Aug 08, 2011

The Burbea-Rao and Bhattacharyya Centroids

Post @ 15:15:36 | Burbea-Rao

We study the centroid with respect to the class of information-theoretic Burbea-Rao divergences that generalize the celebrated Jensen-Shannon divergence by measuring the non-negative Jensen difference induced by a strictly convex and differentiable function. Although those Burbea-Rao divergences are symmetric by construction, they are not metric since they fail to satisfy the triangle inequality. We first explain how a particular symmetrization of Bregman divergences called Jensen-Bregman distances yields exactly those Burbea-Rao divergences. We then proceed by defining skew Burbea-Rao divergences, and show that skew Burbea-Rao divergences amount in limit cases to compute Bregman divergences. We then prove that Burbea-Rao centroids can be arbitrarily finely approximated by a generic iterative concave-convex optimization algorithm with guaranteed convergence property. In the second part of the paper, we consider the Bhattacharyya distance that is commonly used to measure overlapping degree of probability distributions. We show that Bhattacharyya distances on members of the same statistical exponential family amount to calculate a Burbea-Rao divergence in disguise. Thus we get an efficient algorithm for computing the Bhattacharyya centroid of a set of parametric distributions belonging to the same exponential families, improving over former specialized methods found in the literature that were limited to univariate or ?diagonal? multivariate Gaussians. To illustrate the performance of our Bhattacharyya/Burbea-Rao centroid algorithm, we present experimental performance results for $k$-means and hierarchical clustering methods of Gaussian mixture models.

paper

Jul 20, 2011

Distance notations

Post @ 12:39:13 | distance

distnot.png

I am looking for Finslerian quasi-metric distances with applications in computer vision/medical imaging. Any recommendation?

Comments welcome!
Frank.

Jul 14, 2011

Clickremoval

Post @ 16:09:24 | image processing

You can try the applet with your own pictures here:.

Details in the paper: Nielsen, F., Nock, R., 2005. ClickRemoval: Interactive pinpoint image object removal, ACM multimedia, Hilton, Singapore, pp. 315?318.

Jul 11, 2011

A taste of ICCV

Post @ 17:58:42 | computer vision

The list of accepted work at International Conference on Computer Vision(ICCV) is available for a couple of days. As usual, there are many exciting titles. I made a rough selection considering my interests.

Here it is:

  • A Nonparametric Riemannian Framework on Tensor Field with Apllication to Foreground Segmentation
  • A New Distance for Scale-Invariant 3D Shape Recognition and Registration
  • Learning Nonlinear Distance Functions using Neural Network for Regression with Application to Robust Human Age Estimation
  • Fisher Discrimination Dictionary Learning for Sparse Representation
  • Means in spaces of tree-like shapes
  • Learning Mixtures of Sparse Distance Metrics for Classification and Dimensionality Reduction
  • Positive Definite Dictionary Learning for Region Covariances
  • StereoCut: Consistent Interactive Object Selection in Stereo Image Pairs
  • Panoramic Stereo Video Textures
  • Fisher vectors to model spatial layout for image categorization
  • Unsupervised Metric Learning for Face Identification in TV Video
  • Complementary Hashing for Approximate Nearest Neighbor Search
  • A Dimensionality Result for Multiple Homography Matrices
  • Efficient Similarity Search for Covariance Matrices via the Jensen-Bregman LogDet Divergence


Frank.

Jun 17, 2011

C-square vs D-square: Pearson vs Mahalanobis

Post @ 16:35:21 | Mahalanobis distance

Mahalanobis distance is one of the most famous distances in statistics, even nowadays. It is good to look back at original papers, and see the opinions 20 years later by Mahalanobis himself. Some scientific people have fought to have their ideas published.. No social scientific web at that time!.
Read the C2 vs D2 story here.

Frank.

May 30, 2011

Nearest neighbour queries wrt. Bregman divergences

Post @ 11:37:50 | Bregman divergences

Paolo released the code for computing efficiently Bregman nearest neighbour queries. It is a generalization of the traditional vantage point tree algorithm. Paper and source code are available here.
BVP-SKL-256pts.png

May 26, 2011

Translations of technical terms

Post @ 16:28:53 | information geometry

I have been reading a few introductory papers recently in japanese and took the opportunity to collect terms related to information geometry.
Here is a very first dictionary of english-japanese-french terms I encountered.

Frank.

May 25, 2011

Non-flat clustering whith alpha-divergences

Post @ 11:48:01 | alpha-divergences

ICASSP [web] is currently being held in Pragues. Olivier is presenting a poster on our experiments of clustering with alpha-divergences. It is well-known that f-divergences are the invariant statistical divergences [by reparameterization with sufficient statistics or monotonous embedding]. Kullback-Leibler yields flat geometry [equivalent to Bregman divergence for exponential families], and alpha-divergences (although flat on positive measure spaces) are canonical information-geometric divergences for constant curvature curved spaces. We carried out a series of experiments to investigate the choice of alpha in practice. See paper . It was first investigation, and we still get our hands on...

Frank.

May 23, 2011

On tracking portfolios with certainty equivalents on a generalization of Markowitz model: the Fool,

Post @ 20:55:16 | Economy

Portfolio allocation theory has been heavily influenced by a major contribution of Harry Markowitz in the early fifties: the mean-variance approach. While there has been a continuous line of works in on-line learning portfolios over the past decades, very few works have really tried to cope with Markowitz model. A major drawback of the mean-variance approach is that it is approximation-free only when stock returns obey a Gaussian distribution, an assumption known not to hold in real data. In this paper, we first alleviate this assumption, and rigorously lift the mean-variance model to a more general mean-divergence model in which stock returns are allowed to obey any exponential family of distributions. We then devise a general on-line learning algorithm in this setting. We prove for this algorithm the first lower bounds on the most relevant quantity to be optimized in the framework of Markowitz model: the certainty equivalents. Experiments on four real-world stock markets display its ability to track portfolios whose cumulated returns exceed those of the best stock by orders of magnitude.
paper

May 19, 2011

Continuity and discontinuity of Shannon entropy

Post @ 16:24:20 | Shannon information

It is well known that for finite discrete distributions [that is, multinomials], Shannon entropy is continuous inside the open probability simplex. However when the space becomes countably infinite, surprisingly, Shannon entropy is discontinuous (everywhere) as well as other quantities like the mutual information. All details can be found in the paper [pdf]

S.W. Ho and R. W. Yeung,
On the Discontinuity of the Shannon Information Measures, IEEE Transactions on Information Theory, 2009

Frank.

May 18, 2011

Renyi and Tsallis entropies and divergences for exponential families

It is well-known that the Kullback-Leibler divergence of two densities belonging to the same exponential family can be equivalently computed as the Bregman divergence on natural parameters [for the log-normalizer].
So what happens if we consider generelizations of Kullback-Leibler divergence? KL divergence is based on Shannon entropy, so let us look at two generalizations of Shannon entropy: Renyi [preserve additivity] and Tsallis [non-extensive]. It turns out that we have simple closed-form expressions for the relative Renyi/Tsallis entropies of densities belonging to the same exponential family. Moreover, when we consider only Tsallis/Renyi entropies, we end up with simple formula that yields closed-form formula for families with standard carrier measure. We illustrate these results by computing the Tsallis/Renyi entropies/divergences for multivariate normals.

The technical details can be found here.

Frank.

Apr 23, 2011

Smallest enclosing ball in AutoCAD

Post @ 17:54:51 | computational geometry

code in .Net for computing a fast approximation of the smallest enclosing ball.
See also Riemannian extension

Apr 15, 2011

Jensen-Shannon Voronoi diagram

Post @ 11:26:26 | Voronoi diagram

I am revising the paper:
Jensen-Bregman Voronoi Diagrams and Centroidal Tessellations.

So it is time, to clean code, make adjustements, add new insights, etc. Below is one video of the Voronoi diagram with respect to the Jensen-Shannon divergence.



Frank.

Apr 11, 2011

New book: Generalized thermostatistics

Post @ 12:58:40 | generalized IT

A new book is available for people with interests in computational information geometry: GenThermoStat.png

The book is also available freely in PDF

Frank.

Apr 07, 2011

Video stippling

Post @ 16:45:20 | Computer graphics

Voronoi diagrams and centroidal Voronoi tesselations play key roles in computer graphics. For example, in pointillism (or stippling) one needs to represent a shape or model by a set of points (eventually by varying radius disks, colors, etc.). We propose some simple algorithm for stippling video sequences :


Video Stippling

Previous Logs