## Sep 09, 2013

### New blog address! Moving to Wordpress

Post @ 13:21:49 | moving

I will not use anymore this blog system and rather use Wordpress that supports many goodies like latex. Please update your bookmarks to:
Wordpress: Computational Information Geometry Wonderland
Frank

## Jul 30, 2013

### Hyperbolic centers: closed-form minimax and centroids

Post @ 23:34:29 | hyperbolic geometry

It may be surprising to some of you but the centroid in hyperbolic space is not available in closed form except in very particular cases (points lying on the same geodesic). Since we like center-based clustering, we may consider the minimax center that is easy to calculate (either in conformal Poincare or in non-conformal Klein spaces). Here is an illustration:

More details here.
view2.php view2.php

## Jul 18, 2013

### Geometry of binary and multiple hypothesis testing

Post @ 12:58:51 | exponential family manifold

Here are the slides of my forthcoming talk at Geometric Science of Information 2013 in Paris. In short, it is explained how to compute the Chernoff information of a set of distributions (belonging to the same exponential families) from a closest Bregman pair problem (using an affine Bregman Voronoi diagram).

## Jul 13, 2013

### Riemannian matrix centers

We may think of centers are unique minimizers of average powered distances of the center to the matrices. It is important to consider the manifold of structured matrices (say, positive definite or Toeplitz, etc.) when computing this center in order to preserve the structure for the center matrix. There has been considerable efforst on the matrix geometric means with recent progress (July 2013 School and Workshop) and the minimax center. See also how those notions extends to Finsler geometries that has applications in diffusion tensor imaging. There is yet more to come as the field has many remaining challenges to solve!

## Jul 10, 2013

### Bregman matrix divergences and matrix geometry

Post @ 5:00:43 | Bregman divergence

One can define matrix Bregman divergence by either vectorizing matrices and using the standard Bregman (vector) divergences or by defining the generator on a matrix from the Taylor expansion of a scalar generator applied to the power series of the matrix. This yields the following divergences:

More here

## Jul 05, 2013

### Pattern Learning and Recognition on Statistical Manifolds: An Information-Geometric Review

Post @ 16:08:01 | information geometry

I am attending SIMBAD in beautiful historical city York, UK. Here is the abstract of my talk.

We review the information-geometric framework for statistical pattern recognition: First, we explain the role of statistical similarity measures and distances in fundamental statistical pattern recognition problems. We then concisely review the main statistical distances and report a novel versatile family of divergences. Depending on their intrinsic complexity, the statistical patterns are learned by either atomic parametric distributions, semi-parametric finite mixtures, or non-parametric kernel density distributions. Those statistical patterns are interpreted and handled geometrically in statistical manifolds either as single points, weighted sparse point sets or non-weighted dense point sets. We explain the construction of the two prominent families of statistical manifolds: The Rao Riemannian manifolds with geodesic metric distances, and the Amari-Chentsov manifolds with dual asymmetric non-metric divergences. For the latter manifolds, when considering atomic distributions from the same exponential families (including the ubiquitous Gaussian and multinomial families), we end up with dually flat exponential family manifolds that play a crucial role in many applications. We compare the advantages and disadvantages of these two approaches from the algorithmic point of view. Finally, we conclude with further perspectives on how ?geometric thinking? may spur novel pattern modeling and processing paradigms.

## Jun 21, 2013

### 1930: Prof. Harold Hotelling, a precursor of information geometry

Post @ 15:18:19 | Fisher-Rao geometry

Back to 1930, we can find some notes from Prof. Harold Hotelling using the Fisher information matrix to define a Riemannian distance (known today as Fisher-Rao or Rao distance). The note has been recently released in an arxiv report in Appendix 3 of The Epic Story of Maximum Likelihood. Looking at the back number of the AMS newsletter, I found this text:

Frank.

## Jun 19, 2013

### From the IEEE information theory society newsletter

I just received by snail mail the ieee it newsletter. There is a nice introduction to the lautum information that is the reverse mutual information, a reverse Kullback-Leibler divergence, belonging to the f-divergences. Here is the newsletter home page.

## Jun 04, 2013

### Geometric Science of Information (GSI): programme is out!

Post @ 9:44:31 | conference

The 100 papers to be presented at Geometric Science of Information (GSI) have been organized into 22 oral sessions. It is time to register if you are available end of August in Paris!

## May 31, 2013

### Random uniform distribution inside the probability simplex

Post @ 17:38:21 | statistics

Often when we consider algorithms working on discrete distributions, we need to draw uniform random distributions on the probability simplex. Say, we are interested in drawing random trinomials, histograms with 3 bins with bin values summing up to one. If we simply draw at random the bin values and normalize, we obtain such a kind of non-uniform sampling:

(file RandomNonUniformSimplex.pdf).
To get the uniform random distribution, you need to draw bin values with the -log U, where U is the uniform random variable in [0,1) and then normalize. Doing so, you get:

(file RandomUniformSimplex.pdf )
See this blog post too! Frank.

## May 20, 2013

### Perspective click-and-drag area selections in pictures

Post @ 11:18:51 | machine vision

I will attend tomorrow MVA in Kyoto, a major conference on machine vision and applications.

Here are the slides.

## May 16, 2013

### Hyperbolic Voronoi diagrams

Post @ 12:12:53 | Hyperbolic geometry

I am programming figures for updating the arxiv report 1210.8234.
It is computing hyperbolic Voronoi diagrams from affine diagrams.
Here is an example with 1000 points in Poincare conformal ball model:

The way it works is by clipping a power diagram with the bounding sphere. Then we can export this affine diagram to other common hyperbolic models.

The Klein diagram:

has been derived from the power diagram:
alt="PD.png" />
and then transformed into the Poincare ball model:

Here are some PDFs of figures:

## May 05, 2013

### Power diagram, Laguerre geometry, and statistical Voronoi diagrams

Post @ 12:43:17 | Voronoi

I am programming 2D power diagrams. Power diagrams are Voronoi diagrams with respect to the power distance. The power distance being the distance between power points. For each 2D point, we add a weight and compute the power diagram by reducing it to a 3D convex hull. Here is an example of a power diagram:

The power of sites are indicated by the blue circles. Observe that some cells may be empty, and some cells may contain several sites. Here is the PDF of this file so that you can zoom in this vectorial file. The list of generators in x,y,weight so that you can play with it.
I am implementing the 2D power diagrams because it is a universal method for computing affine diagrams, that is diagrams with hyperplane bisectors. For power diagrams, the bisectors are the radical axis of the spheres (that can have imaginary radii). The applications I am targeting is to compute statistical Voronoi diagrams:

• Statistical diagrams with respect to Rao geodesic distance for univariate normals amount to compute a hyperbolic Voronoi diagram, affine in the Klein ball model.
• Statistical diagrams with respect to the Kullback-Leibler divergence for members of the same exponential family amount to compute Bregman Voronoi diagrams. These are affine diagrams too.

There shall be a follow-up on applications of those statistical Voronoi diagrams.

## Apr 26, 2013

### Symmetrical Kullback-Leibler centroid and W-Lambert function

Post @ 10:17:31 | Kullback-Leibler

The Jeffreys divergence of J-divergence (or symmetrical Kullback-Leibler divergence) is defined by:

The Jeffreys centroid for positive histograms is available in closed-form:

The Jeffreys frequency centroid is restricted to belong to the probability simplex:

Here are the slides related to the Jeffreys centroids: Slides

## Apr 23, 2013

### Non-Linear Book Manifolds: Learning from Associations the Dynamic Geometry of Digital Libraries

Post @ 21:57:05 | manifolds

Non-Linear Book Manifolds: Learning from Associations the Dynamic Geometry of Digital Libraries
Richard Nock (CEREGMIA-UAG. France.); Frank Nielsen (Sony CS Labs Tokyo. Japan.); Eric Briys (CEREGMIA-UAG-Cyberlibris. Belgium.) Abstract Mainstream approaches in the design of virtual libraries basically exploit the same ambient spaceas their physical twins. Our paper is an attempt to rather capture automatically the actual space on which the books live, and \textit{learn} the virtual library as a non-linear book manifold.This tackles tantalizing questions, chief among which whether modeling should be static and book focused (\textit{e.g.}using bag of words encoding) or dynamic and user focused (\textit{e.g.} relying on whatwe define as a \textit{bag of readers} encoding). Experiments on a real-world digital librarydisplay that the latter encoding is a serious challenger to the former. Our results also show that the geometric layers of the manifold learned bring sizeable advantages for retrieval and visualization purposes. For example, the topological layer of the manifold allows to craft \textit{Manifold} association rules; experiments display that theybring dramatic improvements over conventional association rules built from the discrete topology of book sets.Improvements embrace \textit{each} of the following major standpoints on association rule mining: computational, support, confidence, lift, and leverage standpoint.

## Apr 15, 2013

### Hypermask, Furhat,and the manifold of multiple homographies!

Post @ 22:18:33 | SIGGRAPH, CG

At Siggraph 1999, we presented hypermask, and demonstrated its use in a theatrical play where a single actor played several actors. The technology is based on projector camera calibration, 3d pose estimation and lip sync. Here is a video a what the system looked back in 1998-1999:

Here is the paper. Hyper mask: projecting a talking head onto a real object

Recently, I came across a nice project called Furhat

historical page
So what is the link with information geometry. Well, it deals with homography estimation, and the manifold of multiple homographies.

## Apr 12, 2013

### 50 source code snippets for visual computing

Post @ 11:26:24 | book

In 2005, I released a book on visual computing:

The book has been used for teaching in several universities. For example

It contains a set of 50 source codes in C++: source codes in C++.
Nowadays, I am more Java(TM) and Processing oriented, and I am refreshing many of the demos. Stay tuned...

## Apr 08, 2013

### Linear SVM, Ball SVM, and minimum enclosing ball

Post @ 17:20:20 | SVM

Linear SVM and the kernel trick are certainly one of the corner stones of supervised machine learning. In fact, it can be generalized to containment machines (geometric machines), and instead of using halfspace containment in SVM, one can choose ball containment and build Ball SVM, and so on.
In Approximating smallest enclosing balls with applications to machine learning, we investigate several algorithms for computing enclosing balls. There is a special emphasis on the nice covering/piercing duality, illustrated by this figure:

## Apr 01, 2013

### On the symmetrical Kullback-Leibler Jeffreys centroids, and Lambert W-function

Post @ 13:41:30 | centroids, k-means

There are many ways to symmetrize the Kullback-Leibler divergence like Jeffreys one-half symmetrized KL or Jensen-Shannon divergence by taking the divergences to the mid-distribution. There is also Chernoff min-max divergence that has a nice geometric interpretation. One question that arises once we get an appropriate divergence is how to perform hard clustering a la k-means. This requires to compute a centroid. We address the problem of computing Jeffreys centroid. We end-up with a closed-form for positive histograms using Lambert W functions:

Furthermore, we prove that renormalizing this centroid gives a bound on the frequency Jeffreys centroid. We also report a simple bisection algorithm for approximating the exact frequency Jeffreys centroid. Note that we can use a variational k-means approach as described in the previous post. Further details are provided here

## Mar 29, 2013

### Variational k-means, k-MLE and Jensen-Bregman clustering (Burbea-Rao clustering)

Post @ 17:42:16 | Clustering

Kmeans clustering is one of the oldest hard clustering algorithm with many refinements since its inception. One of the famous recent results is k-means++ (with interestingly a recent paper showing A Bad Instance for k-Means++)

To prove that k-means converge, we need to

• prove that at the assignment stage, the loss function decreases
• prove that at the relocation stage, the loss function of each cluster (the information like Bregman information) decreases

(Squared) Euclidean k-means does it by assigning points to their closest center and relocation to the center of mass, for each cluster. Now, interestingly, we can have a different loss function for each cluster and k-means still converge. This is done for learning statistical mixtures using the k-MLE algorithm (stands for k-maximum likelihood estimator). See this paper for a proof.

Now, for the first condition, although we would like to have the minimizer of the cluster information for each cluster, it is enough just to decrease it a little bit, hence the name variational k-means. For example, Jensen-Bregman clustering needs to compute Jensen-Bregman centroids (aka. Burbea-Rao centroids) that are not available in closed-form but can be arbitrary approximated. We can decide to approximate those centroids up to machine precision, like this is done in this paper (The Burbea-Rao and Bhattacharyya centroids), or by just doing a single iteration, as recently proned in this paper (Bregman divergences and triangle inequality). The first approach tends to minimize the computational effort.

## Mar 15, 2013

### Connected at Infinity II is out!

Post @ 14:55:11 | book

Recently, I have been engrossed in coding and postponed blogging. However, today I received my complimentary copy of the book entitled Connected at Infinity II. Here are pictures of the front and back cover pages.

The chapter on Cramer-Rao Lower Bound and Information Geometry is also available here.

@incollection{CRIG-2013,
author      = {Frank Nielsen},
title       = {Cram\'er-Rao Lower Bound and Information Geometry},
page              = {18-37},
editor      = {Rajendra Bhatia and C. S. Rajan and Ajit Iqbal Singh},
booktitle   = {Connected at Infinity {II}: A selection of mathematics by {I}ndians},
publisher   = {Hindustan Book Agency (Texts and Readings in Mathematics, TRIM)},
note          = {arxiv 1301.3578}
year        = {2013},
url                   = {http://www.hindbook.com/}
}


## Feb 25, 2013

### Kullback-Leibler projection for best mixture simplification

Post @ 17:05:16 | Kullback-Leibler

Suppose we are given a mixture of n gaussians (p tilde) and we ask for the best gaussian (single component) approximating that mixture with respect to the Kullback-Leibler divergence:

Then this best Gaussian is found as the KL projection of the gaussian mixture onto the exponential family manifold of gaussians. It is equivalent to compute a Bregman barycenter.

Thus we can also learn a mixture by simplifying a kernel density estimator as explained in this paper and illustrated in this figure:

## Feb 20, 2013

### Hypothesis testing: Chernoff information, orthogonality and dual geometry

Post @ 16:50:41 | Chernoff information

One way to detect target from noise in signals is to do binary hypothesis testing. You can model this statistical decision problem and assuming prior probability on the noise/target you seek to bound the probability of error. The Chernoff information is the best exponent error you can get from the best maximum a posteriori rule (MAP). On the dually flat space of exponential families, it is characterized GEOMETRICALLY EXACTLY although you may not have closed-form solutions.

Here is the paper.
We plan to release the code in the next version of jMEF.

## Feb 08, 2013

### Which side of Kullback-Leibler divergence to optimize?

Post @ 16:07:21 | Kullback-Leibler

A common question is which side of Kullback-Leibler (KL) should I consider for optimization. Since KL is the relative entropy: it is the difference of cross-entropy minus the entropy. As such the ideal model should be on right hand side, and the estimated model on left-hand side. Now suppose you want to find the center of two distributions for mixture simplification. You can interpret the results of left-sided and right-sided KL centroids as follows:

• The Kullback-Leibler right-sided centroid is zero-avoiding so that its corresponding density function tries to cover the support of all input normals,
• The Kullback-Leibler left-sided centroid is zero-forcing so that it focuses on the highest mass mode normal.

Here, there is not one model but two ideal models so we need a trade-off (=centroid). The zero-forcing/avoiding property is depicted in the following figure:

Here are the details (paper).

## Jan 31, 2013

### An information-geometric characterization of Chernoff information

Post @ 17:10:59 | paper

The Chernoff information was originally introduced for bounding the probability of error of the Bayesian decision rule in binary hypothesis testing. Nowadays, it is often used as a notion of symmetric distance in statistical signal processing or as a way to define a middle distribution in information fusion. Computing the Chernoff information requires to solve an optimization problem that is numerically approximated in practice. We consider the Chernoff distance for distributions belonging to the same exponential family including the Gaussian and multinomial families. By considering the geometry of the underlying statistical manifold, we define exactly the solution of the optimization problem as the unique intersection of a geodesic with a dual hyperplane. Furthermore, we prove analytically that the Chernoff distance amounts to calculate an equivalent but simpler Bregman divergence defined on the distribution parameters. It follows a closed-form formula for the singly-parametric distributions, or an efficient geodesic bisection search for multi-parametric distributions. Finally, based on this informationgeometric characterization, we propose three novel information-theoretic symmetric distances and middle distributions, from which two of them admit always closed-form expressions.

@ARTICLE{ChernoffDistance-2013,
author={Nielsen, F.},
journal={Signal Processing Letters, IEEE}, title={An information-geometric characterization of Chernoff information},
year={2013},
volume={PP},
number={99},
pages={1},
keywords={Bregman divergence;Chernoff information;exponential families;information fusion;information geometry;},
doi={10.1109/LSP.2013.2243726},
ISSN={1070-9908},}


## Jan 25, 2013

### Loosely bounding a Bregman divergence (without the Hessian)

Post @ 17:05:34 | Bregman divergence

Traditionally, the supremum of the norm of the Hessian of the Bregman generator over a closed domain is used in conjuction with the Lagrange remainder of the Taylor expansion to bound a Bregman divergence since we have:

But we can also bound Bregman divergence easily without the Hessian, by using the gradient, as follows:

I will report how to use this bound when defining and computing the Chernoff distance for exponential families.
Frank.

An update list of publications

## Jan 23, 2013

### Dictionary of computational information geometry

Post @ 17:31:51 | dictionary

I updated the dictionary of terms (580+) used in computational information geometry. Need some serious cleaning soon.
Frank.

## Jan 17, 2013

### Cramer-Rao Lower Bound and Information Geometry

Post @ 20:48:19 | Rao

Happy new year to all of you!
An introductory article: Cramer-Rao Lower Bound and Information Geometry

## Nov 26, 2012

### Smooth transitions between left-sided and right-sided Bregman centroids (with java code):

One of the questions that often arises when using a Kullback-Leibler divergence is "how do we choose the orientation of KL?". In information retrieval, another related argument is "how to symmetrize Kullback-Leibler divergence?" (3 solutions: Jeffreys divergence, Jensen-Shannon divergence, Chernoff divergence). However, sometimes we would like to experiment more precise choices rather than left/right or symmetric orientation. One way to do so is to use skew Burbea-Rao divergences (also called skew Jensen divergences) that tend in limit cases to Bregman divergences:

Skewness factor may depend on applications although a principle way is yet to be analyzed. This sample program Bregman_BurbeaRao_Jensen_SkewDivergence.java demonstrates the skewness property in limit cases. Note that you do not have to compute gradient for approximating Bregman divergences. Here is a run of the program:

Skew Burbea-Rao-Jensen divergences: Bregman divergences obtained in limit case alpha=0 and Bregman reverse for limit case alpha=1
alpha=1.0E-4    0.06831750610769681
alpha=0.5       0.06073382594234067
alpha=0.9999    0.05521937447971147
Bregman direct :                0.0683193293834281       alpha->0
Reverse Bregman:                0.05521841361948579      alpha->1
...

alpha=1.0E-4    0.06831750610769681
alpha=0.5       0.06073382594234067
alpha=0.9999    0.05521937447971147
Bregman direct :                0.0683193293834281       alpha->0
Reverse Bregman:                0.05521841361948579      alpha->1



More details can be found in 1004.5049v3.pdf and 1009.4004v2.pdf
Frank.

## Nov 19, 2012

### k-Maximum Likelihood Estimator for mixtures of generalized Gaussians

Post @ 15:46:54 | learning mixtures

The slides of the talk of Olivier on learning generalized Gaussian mixtures using an extension of the k-MLE framework are online here now.

## Nov 17, 2012

### CFP: Geometric Science of Information (GSI'13)

Post @ 19:23:19 | conference

GSI2013 - Geometric Science of Information will be held in Paris 2013-08-28 - 2013-08-30.