## May 20, 2013

### Perspective click-and-drag area selections in pictures

Post @ 11:18:51 | machine vision

I will attend tomorrow MVA in Kyoto, a major conference on machine vision and applications.

Here are the slides.

## May 16, 2013

### Hyperbolic Voronoi diagrams

Post @ 12:12:53 | Hyperbolic geometry

I am programming figures for updating the arxiv report 1210.8234.
It is computing hyperbolic Voronoi diagrams from affine diagrams.
Here is an example with 1000 points in Poincare conformal ball model:

The way it works is by clipping a power diagram with the bounding sphere. Then we can export this affine diagram to other common hyperbolic models.

The Klein diagram:

has been derived from the power diagram:
alt="PD.png" />
and then transformed into the Poincare ball model:

Here are some PDFs of figures:

## May 05, 2013

### Power diagram, Laguerre geometry, and statistical Voronoi diagrams

Post @ 12:43:17 | Voronoi

I am programming 2D power diagrams. Power diagrams are Voronoi diagrams with respect to the power distance. The power distance being the distance between power points. For each 2D point, we add a weight and compute the power diagram by reducing it to a 3D convex hull. Here is an example of a power diagram:

The power of sites are indicated by the blue circles. Observe that some cells may be empty, and some cells may contain several sites. Here is the PDF of this file so that you can zoom in this vectorial file. The list of generators in x,y,weight so that you can play with it.
I am implementing the 2D power diagrams because it is a universal method for computing affine diagrams, that is diagrams with hyperplane bisectors. For power diagrams, the bisectors are the radical axis of the spheres (that can have imaginary radii). The applications I am targeting is to compute statistical Voronoi diagrams:

• Statistical diagrams with respect to Rao geodesic distance for univariate normals amount to compute a hyperbolic Voronoi diagram, affine in the Klein ball model.
• Statistical diagrams with respect to the Kullback-Leibler divergence for members of the same exponential family amount to compute Bregman Voronoi diagrams. These are affine diagrams too.

There shall be a follow-up on applications of those statistical Voronoi diagrams.

## Apr 26, 2013

### Symmetrical Kullback-Leibler centroid and W-Lambert function

Post @ 10:17:31 | Kullback-Leibler

The Jeffreys divergence of J-divergence (or symmetrical Kullback-Leibler divergence) is defined by:

The Jeffreys centroid for positive histograms is available in closed-form:

The Jeffreys frequency centroid is restricted to belong to the probability simplex:

Here are the slides related to the Jeffreys centroids: Slides

## Apr 23, 2013

### Non-Linear Book Manifolds: Learning from Associations the Dynamic Geometry of Digital Libraries

Post @ 21:57:05 | manifolds

Non-Linear Book Manifolds: Learning from Associations the Dynamic Geometry of Digital Libraries
Richard Nock (CEREGMIA-UAG. France.); Frank Nielsen (Sony CS Labs Tokyo. Japan.); Eric Briys (CEREGMIA-UAG-Cyberlibris. Belgium.) Abstract Mainstream approaches in the design of virtual libraries basically exploit the same ambient spaceas their physical twins. Our paper is an attempt to rather capture automatically the actual space on which the books live, and \textit{learn} the virtual library as a non-linear book manifold.This tackles tantalizing questions, chief among which whether modeling should be static and book focused (\textit{e.g.}using bag of words encoding) or dynamic and user focused (\textit{e.g.} relying on whatwe define as a \textit{bag of readers} encoding). Experiments on a real-world digital librarydisplay that the latter encoding is a serious challenger to the former. Our results also show that the geometric layers of the manifold learned bring sizeable advantages for retrieval and visualization purposes. For example, the topological layer of the manifold allows to craft \textit{Manifold} association rules; experiments display that theybring dramatic improvements over conventional association rules built from the discrete topology of book sets.Improvements embrace \textit{each} of the following major standpoints on association rule mining: computational, support, confidence, lift, and leverage standpoint.

## Apr 15, 2013

### Hypermask, Furhat,and the manifold of multiple homographies!

Post @ 22:18:33 | SIGGRAPH, CG

At Siggraph 1999, we presented hypermask, and demonstrated its use in a theatrical play where a single actor played several actors. The technology is based on projector camera calibration, 3d pose estimation and lip sync. Here is a video a what the system looked back in 1998-1999:

Here is the paper. Hyper mask: projecting a talking head onto a real object

Recently, I came across a nice project called Furhat

historical page
So what is the link with information geometry. Well, it deals with homography estimation, and the manifold of multiple homographies.

## Apr 12, 2013

### 50 source code snippets for visual computing

Post @ 11:26:24 | book

In 2005, I released a book on visual computing:

The book has been used for teaching in several universities. For example

It contains a set of 50 source codes in C++: source codes in C++.
Nowadays, I am more Java(TM) and Processing oriented, and I am refreshing many of the demos. Stay tuned...

## Apr 08, 2013

### Linear SVM, Ball SVM, and minimum enclosing ball

Post @ 17:20:20 | SVM

Linear SVM and the kernel trick are certainly one of the corner stones of supervised machine learning. In fact, it can be generalized to containment machines (geometric machines), and instead of using halfspace containment in SVM, one can choose ball containment and build Ball SVM, and so on.
In Approximating smallest enclosing balls with applications to machine learning, we investigate several algorithms for computing enclosing balls. There is a special emphasis on the nice covering/piercing duality, illustrated by this figure:

## Apr 01, 2013

### On the symmetrical Kullback-Leibler Jeffreys centroids, and Lambert W-function

Post @ 13:41:30 | centroids, k-means

There are many ways to symmetrize the Kullback-Leibler divergence like Jeffreys one-half symmetrized KL or Jensen-Shannon divergence by taking the divergences to the mid-distribution. There is also Chernoff min-max divergence that has a nice geometric interpretation. One question that arises once we get an appropriate divergence is how to perform hard clustering a la k-means. This requires to compute a centroid. We address the problem of computing Jeffreys centroid. We end-up with a closed-form for positive histograms using Lambert W functions:

Furthermore, we prove that renormalizing this centroid gives a bound on the frequency Jeffreys centroid. We also report a simple bisection algorithm for approximating the exact frequency Jeffreys centroid. Note that we can use a variational k-means approach as described in the previous post. Further details are provided here

## Mar 29, 2013

### Variational k-means, k-MLE and Jensen-Bregman clustering (Burbea-Rao clustering)

Post @ 17:42:16 | Clustering

Kmeans clustering is one of the oldest hard clustering algorithm with many refinements since its inception. One of the famous recent results is k-means++ (with interestingly a recent paper showing A Bad Instance for k-Means++)

To prove that k-means converge, we need to

• prove that at the assignment stage, the loss function decreases
• prove that at the relocation stage, the loss function of each cluster (the information like Bregman information) decreases

(Squared) Euclidean k-means does it by assigning points to their closest center and relocation to the center of mass, for each cluster. Now, interestingly, we can have a different loss function for each cluster and k-means still converge. This is done for learning statistical mixtures using the k-MLE algorithm (stands for k-maximum likelihood estimator). See this paper for a proof.

Now, for the first condition, although we would like to have the minimizer of the cluster information for each cluster, it is enough just to decrease it a little bit, hence the name variational k-means. For example, Jensen-Bregman clustering needs to compute Jensen-Bregman centroids (aka. Burbea-Rao centroids) that are not available in closed-form but can be arbitrary approximated. We can decide to approximate those centroids up to machine precision, like this is done in this paper (The Burbea-Rao and Bhattacharyya centroids), or by just doing a single iteration, as recently proned in this paper (Bregman divergences and triangle inequality). The first approach tends to minimize the computational effort.

## Mar 15, 2013

### Connected at Infinity II is out!

Post @ 14:55:11 | book

Recently, I have been engrossed in coding and postponed blogging. However, today I received my complimentary copy of the book entitled Connected at Infinity II. Here are pictures of the front and back cover pages.

The chapter on Cramer-Rao Lower Bound and Information Geometry is also available here.

@incollection{CRIG-2013,
author      = {Frank Nielsen},
title       = {Cram\'er-Rao Lower Bound and Information Geometry},
page              = {18-37},
editor      = {Rajendra Bhatia and C. S. Rajan and Ajit Iqbal Singh},
booktitle   = {Connected at Infinity {II}: A selection of mathematics by {I}ndians},
publisher   = {Hindustan Book Agency (Texts and Readings in Mathematics, TRIM)},
note          = {arxiv 1301.3578}
year        = {2013},
url                   = {http://www.hindbook.com/}
}


## Feb 25, 2013

### Kullback-Leibler projection for best mixture simplification

Post @ 17:05:16 | Kullback-Leibler

Suppose we are given a mixture of n gaussians (p tilde) and we ask for the best gaussian (single component) approximating that mixture with respect to the Kullback-Leibler divergence:

Then this best Gaussian is found as the KL projection of the gaussian mixture onto the exponential family manifold of gaussians. It is equivalent to compute a Bregman barycenter.

Thus we can also learn a mixture by simplifying a kernel density estimator as explained in this paper and illustrated in this figure:

## Feb 20, 2013

### Hypothesis testing: Chernoff information, orthogonality and dual geometry

Post @ 16:50:41 | Chernoff information

One way to detect target from noise in signals is to do binary hypothesis testing. You can model this statistical decision problem and assuming prior probability on the noise/target you seek to bound the probability of error. The Chernoff information is the best exponent error you can get from the best maximum a posteriori rule (MAP). On the dually flat space of exponential families, it is characterized GEOMETRICALLY EXACTLY although you may not have closed-form solutions.

Here is the paper.
We plan to release the code in the next version of jMEF.

## Feb 08, 2013

### Which side of Kullback-Leibler divergence to optimize?

Post @ 16:07:21 | Kullback-Leibler

A common question is which side of Kullback-Leibler (KL) should I consider for optimization. Since KL is the relative entropy: it is the difference of cross-entropy minus the entropy. As such the ideal model should be on right hand side, and the estimated model on left-hand side. Now suppose you want to find the center of two distributions for mixture simplification. You can interpret the results of left-sided and right-sided KL centroids as follows:

• The Kullback-Leibler right-sided centroid is zero-avoiding so that its corresponding density function tries to cover the support of all input normals,
• The Kullback-Leibler left-sided centroid is zero-forcing so that it focuses on the highest mass mode normal.

Here, there is not one model but two ideal models so we need a trade-off (=centroid). The zero-forcing/avoiding property is depicted in the following figure:

Here are the details (paper).

## Jan 31, 2013

### An information-geometric characterization of Chernoff information

Post @ 17:10:59 | paper

The Chernoff information was originally introduced for bounding the probability of error of the Bayesian decision rule in binary hypothesis testing. Nowadays, it is often used as a notion of symmetric distance in statistical signal processing or as a way to define a middle distribution in information fusion. Computing the Chernoff information requires to solve an optimization problem that is numerically approximated in practice. We consider the Chernoff distance for distributions belonging to the same exponential family including the Gaussian and multinomial families. By considering the geometry of the underlying statistical manifold, we define exactly the solution of the optimization problem as the unique intersection of a geodesic with a dual hyperplane. Furthermore, we prove analytically that the Chernoff distance amounts to calculate an equivalent but simpler Bregman divergence defined on the distribution parameters. It follows a closed-form formula for the singly-parametric distributions, or an efficient geodesic bisection search for multi-parametric distributions. Finally, based on this informationgeometric characterization, we propose three novel information-theoretic symmetric distances and middle distributions, from which two of them admit always closed-form expressions.

@ARTICLE{ChernoffDistance-2013,
author={Nielsen, F.},
journal={Signal Processing Letters, IEEE}, title={An information-geometric characterization of Chernoff information},
year={2013},
volume={PP},
number={99},
pages={1},
keywords={Bregman divergence;Chernoff information;exponential families;information fusion;information geometry;},
doi={10.1109/LSP.2013.2243726},
ISSN={1070-9908},}


## Jan 25, 2013

### Loosely bounding a Bregman divergence (without the Hessian)

Post @ 17:05:34 | Bregman divergence

Traditionally, the supremum of the norm of the Hessian of the Bregman generator over a closed domain is used in conjuction with the Lagrange remainder of the Taylor expansion to bound a Bregman divergence since we have:

But we can also bound Bregman divergence easily without the Hessian, by using the gradient, as follows:

I will report how to use this bound when defining and computing the Chernoff distance for exponential families.
Frank.

An update list of publications

## Jan 23, 2013

### Dictionary of computational information geometry

Post @ 17:31:51 | dictionary

I updated the dictionary of terms (580+) used in computational information geometry. Need some serious cleaning soon.
Frank.

## Jan 17, 2013

### Cramer-Rao Lower Bound and Information Geometry

Post @ 20:48:19 | Rao

Happy new year to all of you!
An introductory article: Cramer-Rao Lower Bound and Information Geometry

## Nov 26, 2012

### Smooth transitions between left-sided and right-sided Bregman centroids (with java code):

One of the questions that often arises when using a Kullback-Leibler divergence is "how do we choose the orientation of KL?". In information retrieval, another related argument is "how to symmetrize Kullback-Leibler divergence?" (3 solutions: Jeffreys divergence, Jensen-Shannon divergence, Chernoff divergence). However, sometimes we would like to experiment more precise choices rather than left/right or symmetric orientation. One way to do so is to use skew Burbea-Rao divergences (also called skew Jensen divergences) that tend in limit cases to Bregman divergences:

Skewness factor may depend on applications although a principle way is yet to be analyzed. This sample program Bregman_BurbeaRao_Jensen_SkewDivergence.java demonstrates the skewness property in limit cases. Note that you do not have to compute gradient for approximating Bregman divergences. Here is a run of the program:

Skew Burbea-Rao-Jensen divergences: Bregman divergences obtained in limit case alpha=0 and Bregman reverse for limit case alpha=1
alpha=1.0E-4    0.06831750610769681
alpha=0.5       0.06073382594234067
alpha=0.9999    0.05521937447971147
Bregman direct :                0.0683193293834281       alpha->0
Reverse Bregman:                0.05521841361948579      alpha->1
...

alpha=1.0E-4    0.06831750610769681
alpha=0.5       0.06073382594234067
alpha=0.9999    0.05521937447971147
Bregman direct :                0.0683193293834281       alpha->0
Reverse Bregman:                0.05521841361948579      alpha->1



More details can be found in 1004.5049v3.pdf and 1009.4004v2.pdf
Frank.

## Nov 19, 2012

### k-Maximum Likelihood Estimator for mixtures of generalized Gaussians

Post @ 15:46:54 | learning mixtures

The slides of the talk of Olivier on learning generalized Gaussian mixtures using an extension of the k-MLE framework are online here now.

## Nov 17, 2012

### CFP: Geometric Science of Information (GSI'13)

Post @ 19:23:19 | conference

GSI2013 - Geometric Science of Information will be held in Paris 2013-08-28 - 2013-08-30.
Frank

## Nov 16, 2012

### Videos updated

Post @ 16:13:35 | multimedia

I have updated the web page of videos with this video: Perspective click'n'drag: Quick area selection in photos. SIGGRAPH Asia, 2012

## Nov 06, 2012

### Computational information geometry

Post @ 12:24:52 | CIG

I updated the tiny dictionary of comp info geo terms.
Frank.

## Nov 05, 2012

### First encounter with computational geometry (20 years ago!)

Post @ 11:23:38 | convex hulls

Well, time is flying, isn't it !?! and when cleaning my room, I found this old report. It is my first report (in french) on computational geometry, back to 1992. It considers the problem of computing the convex hull of spheres in arbitrary dimension. Later, I worked on output-sensitive algorithms for convex planar objects: An Output-Sensitive Convex Hull Algorithm for Planar Objects

## Nov 02, 2012

### Presentations at ICPR 2012 (Tsukuba)

Post @ 14:43:43 | conferences

In 10 days, ICPR will kick off in Tsukuba. It is time to prepare poster and shotgun poster presentations!

See you there!
Frank.

## Oct 18, 2012

### Information-geometric signal processing (Slides)

Post @ 19:56:08 | talk

I have uploaded the slides for the talk:
A glance at information-geometric signal processing

## Oct 15, 2012

### A glance at information-geometric signal processing

Post @ 16:13:51 | talk

I will be presenting a talk entitled:

A glance at information-geometric signal processing (see 2012-T-MAHI-JGSSP.pdf) at MAHI: Methodological Aspects of Hyperspectral Imaging

## Oct 10, 2012

### Perspective click'n'drag: Quick area selection in photos

Post @ 12:11:08 | User interface

I have developed a simple UI for selecting areas in photos. See here. Although information geometry is not directly concerned, it can be used during two stages:

• Segmentation
• Estimation of robust homographies in the space of two-views homographies.

## Sep 06, 2012

### Travelling in hyperbolic geometry

Post @ 11:06:24 | fun

Hyperbolic geometry is fascinating, especially in high dims and with complex numbers. Using the many models of hyperbolic geometry, one can choose the best model for a computation purpose. For example, the hyperbolic Voronoi diagram has better be computed in the Klein ball (non-conformal).: Hyperbolic Voronoi diagrams made easy

I did some travelling experiment on travelling myself on the 2D upper plane:

Frank.

## Aug 27, 2012

### It is some time...

Post @ 16:37:52 | publications

... I did not update the publication list. Here is the journal section.

## Aug 21, 2012

### Information-geometric diffusion kernel for the multinomial geometry

Post @ 15:24:37 | Fisher-Rao geometry

A very neat paper is the Diffusion Kernels on Statistical Manifolds

@article{DiffusionKernelsStatManifolds-2005,
author = {Lafferty, John and Lebanon, Guy},
title = {Diffusion Kernels on Statistical Manifolds},
journal = JMLR,
issue_date = {12/1/2005},
volume = {6},
month = dec,
year = {2005},
issn = {1532-4435},
pages = {129--163},
numpages = {35},
publisher = {JMLR.org},
bibEntryDate = {2012/8/20},
bibEntryAuthor={Frank Nielsen}
}


The authors recall that the Fisher-Rao geometry amounts to the spherical geometry,, and design the multinomial diffusion kernel. Application to text classification using tfidf (weighted bag of words) is presented.
@FrnkNlsn