May 20, 2013
Perspective click-and-drag area selections in pictures
May 16, 2013
Hyperbolic Voronoi diagrams
I am programming figures for updating the arxiv report 1210.8234.
It is computing hyperbolic Voronoi diagrams from affine diagrams.
Here is an example with 1000 points in Poincare conformal ball model:
The way it works is by clipping a power diagram with the bounding sphere. Then we can export this affine diagram to other common hyperbolic models.
The Klein diagram:
has been derived from the power diagram:
alt="PD.png" />
and then transformed into the Poincare ball model:
Here are some PDFs of figures:
May 05, 2013
Power diagram, Laguerre geometry, and statistical Voronoi diagrams
I am programming 2D power diagrams. Power diagrams are Voronoi diagrams with respect to the power distance.
The power distance being the distance between power points.
For each 2D point, we add a weight and compute the power diagram by reducing it to a 3D convex hull.
Here is an example of a power diagram:
The power of sites are indicated by the blue circles. Observe that some cells may be empty, and some cells may contain several sites. Here is the PDF of this file so that you can zoom in this vectorial file. The list of generators in x,y,weight so that you can play with it.
I am implementing the 2D power diagrams because it is a universal method for computing affine diagrams, that is diagrams with hyperplane bisectors. For power diagrams, the bisectors are the radical axis of the spheres (that can have imaginary radii).
The applications I am targeting is to compute statistical Voronoi diagrams:
- Statistical diagrams with respect to Rao geodesic distance for univariate normals amount to compute a hyperbolic Voronoi diagram, affine in the Klein ball model.
- Statistical diagrams with respect to the Kullback-Leibler divergence for members of the same exponential family amount to compute Bregman Voronoi diagrams. These are affine diagrams too.
There shall be a follow-up on applications of those statistical Voronoi diagrams.
Apr 26, 2013
Symmetrical Kullback-Leibler centroid and W-Lambert function
The Jeffreys divergence of J-divergence (or symmetrical Kullback-Leibler divergence) is defined by:
The Jeffreys centroid for positive histograms is available in closed-form:
The Jeffreys frequency centroid is restricted to belong to the probability simplex:
Here are the slides related to the Jeffreys centroids: Slides
Apr 23, 2013
Non-Linear Book Manifolds: Learning from Associations the Dynamic Geometry of Digital Libraries
Non-Linear Book Manifolds: Learning from Associations the Dynamic Geometry of Digital Libraries
Richard Nock (CEREGMIA-UAG. France.); Frank Nielsen (Sony CS Labs Tokyo. Japan.); Eric Briys (CEREGMIA-UAG-Cyberlibris. Belgium.)
Abstract
Mainstream approaches in the design of virtual libraries basically exploit the same ambient spaceas their physical twins. Our paper is an attempt to rather capture automatically the actual space on which the books live, and \textit{learn} the virtual library as a non-linear book manifold.This tackles tantalizing questions, chief among which whether modeling should be static and book focused (\textit{e.g.}using bag of words encoding) or dynamic and user focused (\textit{e.g.} relying on whatwe define as a \textit{bag of readers} encoding). Experiments on a real-world digital librarydisplay that the latter encoding is a serious challenger to the former. Our results also show that the geometric layers of the manifold learned bring sizeable advantages for retrieval and visualization purposes. For example, the topological layer of the manifold allows to craft \textit{Manifold} association rules; experiments display that theybring dramatic improvements over conventional association rules built from the discrete topology of book sets.Improvements embrace \textit{each} of the following major standpoints on association rule mining: computational, support, confidence, lift, and leverage standpoint.
Apr 15, 2013
Hypermask, Furhat,and the manifold of multiple homographies!
At Siggraph 1999, we presented hypermask, and demonstrated its use in a theatrical play where a single actor played several actors. The technology is based on projector camera calibration, 3d pose estimation and lip sync. Here is a video a what the system looked back in 1998-1999:
Here is the paper.
Hyper mask: projecting a talking head onto a real object
Recently, I came across a nice project called Furhat
historical page
So what is the link with information geometry. Well, it deals with homography estimation, and the manifold of multiple homographies.
Apr 12, 2013
50 source code snippets for visual computing
In 2005, I released a book on visual computing:
The book has been used for teaching in several universities.
For example
- Cardiff U, UK
- Salzburg U, Austria
- Beira Interior U, Portugal
- Electro-Communications U, Japan
- Polytechnic Madrid U, Spain
- MPI, Germany
- Braunschweig University of Technology, Germany
- Ecole Polytechnique, France
- etc...
See the table of contents
It contains a set of 50 source codes in C++:
source codes in C++.
Nowadays, I am more Java(TM) and Processing oriented, and I am refreshing many of the demos.
Stay tuned...
Apr 08, 2013
Linear SVM, Ball SVM, and minimum enclosing ball
Linear SVM and the kernel trick are certainly one of the corner stones of supervised machine learning.
In fact, it can be generalized to containment machines (geometric machines), and instead of using halfspace containment in SVM, one can choose ball containment and build Ball SVM, and so on.
In
Approximating smallest enclosing balls with applications to machine learning, we investigate several algorithms for computing enclosing balls. There is a special emphasis on the nice covering/piercing duality, illustrated by this figure:
See also Approximating smallest enclosing balls
Apr 01, 2013
On the symmetrical Kullback-Leibler Jeffreys centroids, and Lambert W-function
There are many ways to symmetrize the Kullback-Leibler divergence like Jeffreys one-half symmetrized KL or Jensen-Shannon divergence by taking the divergences to the mid-distribution. There is also Chernoff min-max divergence that has a nice geometric interpretation.
One question that arises once we get an appropriate divergence is how to perform hard clustering a la k-means.
This requires to compute a centroid. We address the problem of computing Jeffreys centroid.
We end-up with a closed-form for positive histograms using Lambert W functions:
Furthermore, we prove that renormalizing this centroid gives a bound on the frequency Jeffreys centroid.
We also report a simple bisection algorithm for approximating the exact frequency Jeffreys centroid.
Note that we can use a variational k-means approach as described in the previous post.
Further details are provided here
Mar 29, 2013
Variational k-means, k-MLE and Jensen-Bregman clustering (Burbea-Rao clustering)
Kmeans clustering is one of the oldest hard clustering algorithm with many refinements since its inception.
One of the famous recent results is k-means++ (with interestingly a recent paper showing A Bad Instance for k-Means++)
To prove that k-means converge, we need to
- prove that at the assignment stage, the loss function decreases
- prove that at the relocation stage, the loss function of each cluster (the information like Bregman information) decreases
(Squared) Euclidean k-means does it by assigning points to their closest center and relocation to the center of mass, for each cluster. Now, interestingly, we can have a different loss function for each cluster and k-means still converge. This is done for learning statistical mixtures using the k-MLE algorithm (stands for k-maximum likelihood estimator). See this paper for a proof.
Now, for the first condition, although we would like to have the minimizer of the cluster information for each cluster, it is enough just to decrease it a little bit, hence the name variational k-means. For example, Jensen-Bregman clustering needs to compute Jensen-Bregman centroids (aka. Burbea-Rao centroids) that are not available in closed-form but can be arbitrary approximated. We can decide to approximate those centroids up to machine precision, like this is done in this paper (The Burbea-Rao and Bhattacharyya centroids), or by just doing a single iteration, as recently proned in this paper (Bregman divergences and triangle inequality). The first approach tends to minimize the computational effort.
Mar 15, 2013
Connected at Infinity II is out!
Recently, I have been engrossed in coding and postponed blogging.
However, today I received my complimentary copy of the book entitled Connected at Infinity II.
Here are pictures of the front and back cover pages.
The chapter on Cramer-Rao Lower Bound and Information Geometry is also available here.
@incollection{CRIG-2013,
author = {Frank Nielsen},
title = {Cram\'er-Rao Lower Bound and Information Geometry},
page = {18-37},
editor = {Rajendra Bhatia and C. S. Rajan and Ajit Iqbal Singh},
booktitle = {Connected at Infinity {II}: A selection of mathematics by {I}ndians},
publisher = {Hindustan Book Agency (Texts and Readings in Mathematics, TRIM)},
note = {arxiv 1301.3578}
year = {2013},
url = {http://www.hindbook.com/}
}
Feb 25, 2013
Kullback-Leibler projection for best mixture simplification
Suppose we are given a mixture of n gaussians (p tilde) and we ask for the best gaussian (single component) approximating that mixture with respect to the Kullback-Leibler divergence:
Then this best Gaussian is found as the KL projection of the gaussian mixture onto the exponential family manifold of gaussians. It is equivalent to compute a Bregman barycenter.
Thus we can also learn a mixture by simplifying a kernel density estimator as explained in this paper and illustrated in this figure:
Feb 20, 2013
Hypothesis testing: Chernoff information, orthogonality and dual geometry
One way to detect target from noise in signals is to do binary hypothesis testing.
You can model this statistical decision problem and assuming prior probability on the noise/target you seek to bound the probability of error. The Chernoff information is the best exponent error you can get from the best maximum a posteriori rule (MAP).
On the dually flat space of exponential families, it is characterized GEOMETRICALLY EXACTLY although you may not have closed-form solutions.
Here is the paper.
We plan to release the code in the next version of jMEF.
Feb 08, 2013
Which side of Kullback-Leibler divergence to optimize?
A common question is which side of Kullback-Leibler (KL) should I consider for optimization. Since KL is the relative entropy: it is the difference of cross-entropy minus the entropy. As such the ideal model should be on right hand side, and the estimated model on left-hand side. Now suppose you want to find the center of two distributions for mixture simplification. You can interpret the results of left-sided and right-sided KL centroids as follows:
- The Kullback-Leibler right-sided centroid is zero-avoiding so that its corresponding density function tries to cover the support of all input normals,
- The Kullback-Leibler left-sided centroid is zero-forcing so that it focuses on the highest mass mode normal.
Here, there is not one model but two ideal models so we need a trade-off (=centroid).
The zero-forcing/avoiding property is depicted in the following figure:
Here are the details (paper).
Jan 31, 2013
An information-geometric characterization of Chernoff information
The Chernoff information was originally introduced for bounding the probability of error of the Bayesian decision rule in binary hypothesis testing. Nowadays, it is often used as a notion of symmetric distance in statistical signal processing or as a way to define a middle distribution in information fusion. Computing the Chernoff information requires to solve an optimization problem that is numerically approximated in practice. We consider the Chernoff distance for distributions belonging to the same exponential family including the Gaussian and multinomial families. By considering the geometry of the underlying statistical manifold, we define exactly the solution of the optimization problem as the unique intersection of a geodesic with a dual hyperplane. Furthermore, we prove analytically that the Chernoff distance amounts to calculate an equivalent but simpler Bregman divergence defined on the distribution parameters. It follows a closed-form formula for the singly-parametric distributions, or an efficient geodesic bisection search for multi-parametric distributions. Finally, based on this informationgeometric characterization, we propose three novel information-theoretic symmetric distances and middle distributions, from which two of them admit always closed-form expressions.
@ARTICLE{ChernoffDistance-2013,
author={Nielsen, F.},
journal={Signal Processing Letters, IEEE}, title={An information-geometric characterization of Chernoff information},
year={2013},
volume={PP},
number={99},
pages={1},
keywords={Bregman divergence;Chernoff information;exponential families;information fusion;information geometry;},
doi={10.1109/LSP.2013.2243726},
ISSN={1070-9908},}
Jan 25, 2013
Loosely bounding a Bregman divergence (without the Hessian)
Traditionally, the supremum of the norm of the Hessian of the Bregman generator over a closed domain is used in conjuction with the Lagrange remainder of the Taylor expansion to bound a Bregman divergence since we have:
But we can also bound Bregman divergence easily without the Hessian, by using the gradient, as follows:
I will report how to use this bound when defining and computing the Chernoff distance for exponential families.
Frank.
An update list of publications
Jan 23, 2013
Dictionary of computational information geometry
I updated the dictionary of terms (580+) used in computational information geometry.
Need some serious cleaning soon.
Frank.
Jan 17, 2013
Cramer-Rao Lower Bound and Information Geometry
Happy new year to all of you!
An introductory article:
Cramer-Rao Lower Bound and Information Geometry
Nov 26, 2012
Smooth transitions between left-sided and right-sided Bregman centroids (with java code):
One of the questions that often arises when using a Kullback-Leibler divergence is "how do we choose the orientation of KL?".
In information retrieval, another related argument is "how to symmetrize Kullback-Leibler divergence?" (3 solutions: Jeffreys divergence, Jensen-Shannon divergence, Chernoff divergence).
However, sometimes we would like to experiment more precise choices rather than left/right or symmetric orientation.
One way to do so is to use skew Burbea-Rao divergences (also called skew Jensen divergences) that tend in limit cases to Bregman divergences:
Skewness factor may depend on applications although a principle way is yet to be analyzed.
This sample program Bregman_BurbeaRao_Jensen_SkewDivergence.java demonstrates the skewness property in limit cases.
Note that you do not have to compute gradient for approximating Bregman divergences.
Here is a run of the program:
Skew Burbea-Rao-Jensen divergences: Bregman divergences obtained in limit case alpha=0 and Bregman reverse for limit case alpha=1 alpha=1.0E-4 0.06831750610769681 alpha=0.5 0.06073382594234067 alpha=0.9999 0.05521937447971147 Bregman direct : 0.0683193293834281 alpha->0 Reverse Bregman: 0.05521841361948579 alpha->1 ... alpha=1.0E-4 0.06831750610769681 alpha=0.5 0.06073382594234067 alpha=0.9999 0.05521937447971147 Bregman direct : 0.0683193293834281 alpha->0 Reverse Bregman: 0.05521841361948579 alpha->1
More details can be found in 1004.5049v3.pdf and 1009.4004v2.pdf
Frank.
Nov 19, 2012
k-Maximum Likelihood Estimator for mixtures of generalized Gaussians
The slides of the talk of Olivier on learning generalized Gaussian mixtures using an extension of the k-MLE framework are online here now.
Nov 17, 2012
CFP: Geometric Science of Information (GSI'13)
GSI2013 - Geometric Science of Information will be held in Paris
2013-08-28 - 2013-08-30.
Please spread the word!
Frank
Nov 16, 2012
Videos updated
I have updated the web page of videos with this video:
Perspective click'n'drag: Quick area selection in photos. SIGGRAPH Asia, 2012
Nov 06, 2012
Computational information geometry
I updated the tiny dictionary of comp info geo terms.
Frank.
Nov 05, 2012
First encounter with computational geometry (20 years ago!)
Well, time is flying, isn't it !?! and when cleaning my room, I found this old report. It is my first report (in french) on computational geometry, back to 1992. It considers the problem of computing the convex hull of spheres in arbitrary dimension. Later, I worked on output-sensitive algorithms for convex planar objects: An Output-Sensitive Convex Hull Algorithm for Planar Objects
Nov 02, 2012
Presentations at ICPR 2012 (Tsukuba)
In 10 days, ICPR will kick off in Tsukuba. It is time to prepare poster and shotgun poster presentations!
-
Closed-Form Information-Theoretic Divergences for Statistical Mixtures:
shotgun presentation and poster - Jensen Divergence Based SPD Matrix Means and Applications:
shotgun presentation and poster - k-MLE for mixtures of generalized Gaussians
See you there!
Frank.
Oct 18, 2012
Information-geometric signal processing (Slides)
I have uploaded the slides for the talk:
A glance at information-geometric signal processing
Oct 15, 2012
A glance at information-geometric signal processing
I will be presenting a talk entitled:
A glance at information-geometric signal processing (see 2012-T-MAHI-JGSSP.pdf) at MAHI: Methodological Aspects of Hyperspectral Imaging
Oct 10, 2012
Perspective click'n'drag: Quick area selection in photos
I have developed a simple UI for selecting areas in photos. See here. Although information geometry is not directly concerned, it can be used during two stages:
- Segmentation
- Estimation of robust homographies in the space of two-views homographies.
Sep 06, 2012
Travelling in hyperbolic geometry
Hyperbolic geometry is fascinating, especially in high dims and with complex numbers. Using the many models of hyperbolic geometry, one can choose the best model for a computation purpose. For example, the hyperbolic Voronoi diagram has better be computed in the Klein ball (non-conformal).: Hyperbolic Voronoi diagrams made easy
I did some travelling experiment on travelling myself on the 2D upper plane:
Frank.
Aug 27, 2012
It is some time...
... I did not update the publication list. Here is the journal section.
Aug 21, 2012
Information-geometric diffusion kernel for the multinomial geometry
A very neat paper is the Diffusion Kernels on Statistical Manifolds
@article{DiffusionKernelsStatManifolds-2005,
author = {Lafferty, John and Lebanon, Guy},
title = {Diffusion Kernels on Statistical Manifolds},
journal = JMLR,
issue_date = {12/1/2005},
volume = {6},
month = dec,
year = {2005},
issn = {1532-4435},
pages = {129--163},
numpages = {35},
publisher = {JMLR.org},
bibEntryDate = {2012/8/20},
bibEntryAuthor={Frank Nielsen}
}
The authors recall that the Fisher-Rao geometry amounts to the spherical geometry,, and design the multinomial diffusion kernel. Application to text classification using tfidf (weighted bag of words) is presented.
@FrnkNlsn
I will attend tomorrow MVA in Kyoto, a major conference on machine vision and applications.
Here are the slides.