May 16, 2013

Hyperbolic Voronoi diagrams

Post @ 12:12:53 | Hyperbolic geometry

I am programming figures for updating the arxiv report 1210.8234.
It is computing hyperbolic Voronoi diagrams from affine diagrams.
Here is an example with 1000 points in Poincare conformal ball model:
PK1000pts.png

The way it works is by clipping a power diagram with the bounding sphere. Then we can export this affine diagram to other common hyperbolic models.


The Klein diagram:
PKVD.png
has been derived from the power diagram:
alt="PD.png" />
and then transformed into the Poincare ball model:
PKVD.png
Here are some PDFs of figures:

May 05, 2013

Power diagram, Laguerre geometry, and statistical Voronoi diagrams

Post @ 12:43:17 | Voronoi

I am programming 2D power diagrams. Power diagrams are Voronoi diagrams with respect to the power distance. The power distance being the distance between power points. For each 2D point, we add a weight and compute the power diagram by reducing it to a 3D convex hull. Here is an example of a power diagram:
PowerDiagran-ex.png

The power of sites are indicated by the blue circles. Observe that some cells may be empty, and some cells may contain several sites. Here is the PDF of this file so that you can zoom in this vectorial file. The list of generators in x,y,weight so that you can play with it.
I am implementing the 2D power diagrams because it is a universal method for computing affine diagrams, that is diagrams with hyperplane bisectors. For power diagrams, the bisectors are the radical axis of the spheres (that can have imaginary radii). The applications I am targeting is to compute statistical Voronoi diagrams:

  • Statistical diagrams with respect to Rao geodesic distance for univariate normals amount to compute a hyperbolic Voronoi diagram, affine in the Klein ball model.
  • Statistical diagrams with respect to the Kullback-Leibler divergence for members of the same exponential family amount to compute Bregman Voronoi diagrams. These are affine diagrams too.

There shall be a follow-up on applications of those statistical Voronoi diagrams.

Apr 26, 2013

Symmetrical Kullback-Leibler centroid and W-Lambert function

Post @ 10:17:31 | Kullback-Leibler

The Jeffreys divergence of J-divergence (or symmetrical Kullback-Leibler divergence) is defined by:
jdiv.png

The Jeffreys centroid for positive histograms is available in closed-form:
jcentroid-theo1.png

The Jeffreys frequency centroid is restricted to belong to the probability simplex:
jfcentroid.png


jtheo2.png

Here are the slides related to the Jeffreys centroids: Slides

Apr 23, 2013

Non-Linear Book Manifolds: Learning from Associations the Dynamic Geometry of Digital Libraries

Post @ 21:57:05 | manifolds

Non-Linear Book Manifolds: Learning from Associations the Dynamic Geometry of Digital Libraries
Richard Nock (CEREGMIA-UAG. France.); Frank Nielsen (Sony CS Labs Tokyo. Japan.); Eric Briys (CEREGMIA-UAG-Cyberlibris. Belgium.) Abstract Mainstream approaches in the design of virtual libraries basically exploit the same ambient spaceas their physical twins. Our paper is an attempt to rather capture automatically the actual space on which the books live, and \textit{learn} the virtual library as a non-linear book manifold.This tackles tantalizing questions, chief among which whether modeling should be static and book focused (\textit{e.g.}using bag of words encoding) or dynamic and user focused (\textit{e.g.} relying on whatwe define as a \textit{bag of readers} encoding). Experiments on a real-world digital librarydisplay that the latter encoding is a serious challenger to the former. Our results also show that the geometric layers of the manifold learned bring sizeable advantages for retrieval and visualization purposes. For example, the topological layer of the manifold allows to craft \textit{Manifold} association rules; experiments display that theybring dramatic improvements over conventional association rules built from the discrete topology of book sets.Improvements embrace \textit{each} of the following major standpoints on association rule mining: computational, support, confidence, lift, and leverage standpoint.

Apr 15, 2013

Hypermask, Furhat,and the manifold of multiple homographies!

Post @ 22:18:33 | SIGGRAPH, CG

At Siggraph 1999, we presented hypermask, and demonstrated its use in a theatrical play where a single actor played several actors. The technology is based on projector camera calibration, 3d pose estimation and lip sync. Here is a video a what the system looked back in 1998-1999:


Here is the paper. Hyper mask: projecting a talking head onto a real object

Recently, I came across a nice project called Furhat


historical page
So what is the link with information geometry. Well, it deals with homography estimation, and the manifold of multiple homographies.

Apr 12, 2013

50 source code snippets for visual computing

Post @ 11:26:24 | book

In 2005, I released a book on visual computing:
visualcomputing.jpg
The book has been used for teaching in several universities. For example

See the table of contents
It contains a set of 50 source codes in C++: source codes in C++.
Nowadays, I am more Java(TM) and Processing oriented, and I am refreshing many of the demos. Stay tuned...

Apr 08, 2013

Linear SVM, Ball SVM, and minimum enclosing ball

Post @ 17:20:20 | SVM

Linear SVM and the kernel trick are certainly one of the corner stones of supervised machine learning. In fact, it can be generalized to containment machines (geometric machines), and instead of using halfspace containment in SVM, one can choose ball containment and build Ball SVM, and so on.
In Approximating smallest enclosing balls with applications to machine learning, we investigate several algorithms for computing enclosing balls. There is a special emphasis on the nice covering/piercing duality, illustrated by this figure:
PrimalDualPiercingCoveringBall.png
See also Approximating smallest enclosing balls

Apr 01, 2013

On the symmetrical Kullback-Leibler Jeffreys centroids, and Lambert W-function

Post @ 13:41:30 | centroids, k-means

There are many ways to symmetrize the Kullback-Leibler divergence like Jeffreys one-half symmetrized KL or Jensen-Shannon divergence by taking the divergences to the mid-distribution. There is also Chernoff min-max divergence that has a nice geometric interpretation. One question that arises once we get an appropriate divergence is how to perform hard clustering a la k-means. This requires to compute a centroid. We address the problem of computing Jeffreys centroid. We end-up with a closed-form for positive histograms using Lambert W functions:
JeffreysCentroidClosedForm.png
Furthermore, we prove that renormalizing this centroid gives a bound on the frequency Jeffreys centroid. We also report a simple bisection algorithm for approximating the exact frequency Jeffreys centroid. Note that we can use a variational k-means approach as described in the previous post. Further details are provided here

Mar 29, 2013

Variational k-means, k-MLE and Jensen-Bregman clustering (Burbea-Rao clustering)

Post @ 17:42:16 | Clustering

Kmeans clustering is one of the oldest hard clustering algorithm with many refinements since its inception. One of the famous recent results is k-means++ (with interestingly a recent paper showing A Bad Instance for k-Means++)

To prove that k-means converge, we need to

  • prove that at the assignment stage, the loss function decreases
  • prove that at the relocation stage, the loss function of each cluster (the information like Bregman information) decreases

(Squared) Euclidean k-means does it by assigning points to their closest center and relocation to the center of mass, for each cluster. Now, interestingly, we can have a different loss function for each cluster and k-means still converge. This is done for learning statistical mixtures using the k-MLE algorithm (stands for k-maximum likelihood estimator). See this paper for a proof.

Now, for the first condition, although we would like to have the minimizer of the cluster information for each cluster, it is enough just to decrease it a little bit, hence the name variational k-means. For example, Jensen-Bregman clustering needs to compute Jensen-Bregman centroids (aka. Burbea-Rao centroids) that are not available in closed-form but can be arbitrary approximated. We can decide to approximate those centroids up to machine precision, like this is done in this paper (The Burbea-Rao and Bhattacharyya centroids), or by just doing a single iteration, as recently proned in this paper (Bregman divergences and triangle inequality). The first approach tends to minimize the computational effort.

Mar 15, 2013

Connected at Infinity II is out!

Post @ 14:55:11 | book

Recently, I have been engrossed in coding and postponed blogging. However, today I received my complimentary copy of the book entitled Connected at Infinity II. Here are pictures of the front and back cover pages.
ConnectedAtInfinity2-Front.JPG ConnectedAtInfinity2-Back.JPG


The chapter on Cramer-Rao Lower Bound and Information Geometry is also available here.


@incollection{CRIG-2013,
  author      = {Frank Nielsen},
  title       = {Cram\'er-Rao Lower Bound and Information Geometry},
  page              = {18-37},
  editor      = {Rajendra Bhatia and C. S. Rajan and Ajit Iqbal Singh},
  booktitle   = {Connected at Infinity {II}: A selection of mathematics by {I}ndians},
  publisher   = {Hindustan Book Agency (Texts and Readings in Mathematics, TRIM)},
  note          = {arxiv 1301.3578}
  year        = {2013},
  url                   = {http://www.hindbook.com/}
}

Feb 25, 2013

Kullback-Leibler projection for best mixture simplification

Post @ 17:05:16 | Kullback-Leibler

Suppose we are given a mixture of n gaussians (p tilde) and we ask for the best gaussian (single component) approximating that mixture with respect to the Kullback-Leibler divergence:
centroid-mixturesimplification.png
Then this best Gaussian is found as the KL projection of the gaussian mixture onto the exponential family manifold of gaussians. It is equivalent to compute a Bregman barycenter.
KLprojection.png
Thus we can also learn a mixture by simplifying a kernel density estimator as explained in this paper and illustrated in this figure:
SimplifyKDE.png

Feb 20, 2013

Hypothesis testing: Chernoff information, orthogonality and dual geometry

Post @ 16:50:41 | Chernoff information

One way to detect target from noise in signals is to do binary hypothesis testing. You can model this statistical decision problem and assuming prior probability on the noise/target you seek to bound the probability of error. The Chernoff information is the best exponent error you can get from the best maximum a posteriori rule (MAP). On the dually flat space of exponential families, it is characterized GEOMETRICALLY EXACTLY although you may not have closed-form solutions.
ChernoffBinaryHT.png
Here is the paper.
We plan to release the code in the next version of jMEF.

Feb 08, 2013

Which side of Kullback-Leibler divergence to optimize?

Post @ 16:07:21 | Kullback-Leibler

A common question is which side of Kullback-Leibler (KL) should I consider for optimization. Since KL is the relative entropy: it is the difference of cross-entropy minus the entropy. As such the ideal model should be on right hand side, and the estimated model on left-hand side. Now suppose you want to find the center of two distributions for mixture simplification. You can interpret the results of left-sided and right-sided KL centroids as follows:

  • The Kullback-Leibler right-sided centroid is zero-avoiding so that its corresponding density function tries to cover the support of all input normals,
  • The Kullback-Leibler left-sided centroid is zero-forcing so that it focuses on the highest mass mode normal.

Here, there is not one model but two ideal models so we need a trade-off (=centroid). The zero-forcing/avoiding property is depicted in the following figure:
KLsidedandsymmetrized.png
Here are the details (paper).

Jan 31, 2013

An information-geometric characterization of Chernoff information

Post @ 17:10:59 | paper

The Chernoff information was originally introduced for bounding the probability of error of the Bayesian decision rule in binary hypothesis testing. Nowadays, it is often used as a notion of symmetric distance in statistical signal processing or as a way to define a middle distribution in information fusion. Computing the Chernoff information requires to solve an optimization problem that is numerically approximated in practice. We consider the Chernoff distance for distributions belonging to the same exponential family including the Gaussian and multinomial families. By considering the geometry of the underlying statistical manifold, we define exactly the solution of the optimization problem as the unique intersection of a geodesic with a dual hyperplane. Furthermore, we prove analytically that the Chernoff distance amounts to calculate an equivalent but simpler Bregman divergence defined on the distribution parameters. It follows a closed-form formula for the singly-parametric distributions, or an efficient geodesic bisection search for multi-parametric distributions. Finally, based on this informationgeometric characterization, we propose three novel information-theoretic symmetric distances and middle distributions, from which two of them admit always closed-form expressions.

paper

@ARTICLE{ChernoffDistance-2013,
author={Nielsen, F.},
journal={Signal Processing Letters, IEEE}, title={An information-geometric characterization of Chernoff information},
year={2013},
volume={PP},
number={99},
pages={1},
keywords={Bregman divergence;Chernoff information;exponential families;information fusion;information geometry;},
doi={10.1109/LSP.2013.2243726},
ISSN={1070-9908},}

Jan 25, 2013

Loosely bounding a Bregman divergence (without the Hessian)

Post @ 17:05:34 | Bregman divergence

Traditionally, the supremum of the norm of the Hessian of the Bregman generator over a closed domain is used in conjuction with the Lagrange remainder of the Taylor expansion to bound a Bregman divergence since we have:
BregmanRemainder.png

But we can also bound Bregman divergence easily without the Hessian, by using the gradient, as follows:

BregmanBoundGradient.png

I will report how to use this bound when defining and computing the Chernoff distance for exponential families.
Frank.

An update list of publications

Jan 23, 2013

Dictionary of computational information geometry

Post @ 17:31:51 | dictionary

I updated the dictionary of terms (580+) used in computational information geometry. Need some serious cleaning soon.
Frank.

Jan 17, 2013

Nov 26, 2012

Smooth transitions between left-sided and right-sided Bregman centroids (with java code):

One of the questions that often arises when using a Kullback-Leibler divergence is "how do we choose the orientation of KL?". In information retrieval, another related argument is "how to symmetrize Kullback-Leibler divergence?" (3 solutions: Jeffreys divergence, Jensen-Shannon divergence, Chernoff divergence). However, sometimes we would like to experiment more precise choices rather than left/right or symmetric orientation. One way to do so is to use skew Burbea-Rao divergences (also called skew Jensen divergences) that tend in limit cases to Bregman divergences:

skewBurbea-Rao-divergence.png
Skewness factor may depend on applications although a principle way is yet to be analyzed. This sample program Bregman_BurbeaRao_Jensen_SkewDivergence.java demonstrates the skewness property in limit cases. Note that you do not have to compute gradient for approximating Bregman divergences. Here is a run of the program:

Skew Burbea-Rao-Jensen divergences: Bregman divergences obtained in limit case alpha=0 and Bregman reverse for limit case alpha=1
alpha=1.0E-4    0.06831750610769681
alpha=0.5       0.06073382594234067
alpha=0.9999    0.05521937447971147
Bregman direct :                0.0683193293834281       alpha->0
Reverse Bregman:                0.05521841361948579      alpha->1
...

alpha=1.0E-4    0.06831750610769681
alpha=0.5       0.06073382594234067
alpha=0.9999    0.05521937447971147
Bregman direct :                0.0683193293834281       alpha->0
Reverse Bregman:                0.05521841361948579      alpha->1

More details can be found in 1004.5049v3.pdf and 1009.4004v2.pdf
Frank.

Nov 19, 2012

k-Maximum Likelihood Estimator for mixtures of generalized Gaussians

Post @ 15:46:54 | learning mixtures

The slides of the talk of Olivier on learning generalized Gaussian mixtures using an extension of the k-MLE framework are online here now.

Nov 17, 2012

CFP: Geometric Science of Information (GSI'13)

Post @ 19:23:19 | conference

GSI2013 - Geometric Science of Information will be held in Paris 2013-08-28 - 2013-08-30.
Please spread the word!
Frank

Nov 16, 2012

Videos updated

Post @ 16:13:35 | multimedia

I have updated the web page of videos with this video: Perspective click'n'drag: Quick area selection in photos. SIGGRAPH Asia, 2012

Nov 06, 2012

Computational information geometry

Post @ 12:24:52 | CIG

I updated the tiny dictionary of comp info geo terms.
Frank.

Nov 05, 2012

First encounter with computational geometry (20 years ago!)

Post @ 11:23:38 | convex hulls

Well, time is flying, isn't it !?! and when cleaning my room, I found this old report. It is my first report (in french) on computational geometry, back to 1992. It considers the problem of computing the convex hull of spheres in arbitrary dimension. Later, I worked on output-sensitive algorithms for convex planar objects: An Output-Sensitive Convex Hull Algorithm for Planar Objects

Nov 02, 2012

Presentations at ICPR 2012 (Tsukuba)

Post @ 14:43:43 | conferences

In 10 days, ICPR will kick off in Tsukuba. It is time to prepare poster and shotgun poster presentations!

See you there!
Frank.

Oct 18, 2012

Information-geometric signal processing (Slides)

Post @ 19:56:08 | talk

I have uploaded the slides for the talk:
A glance at information-geometric signal processing

slides.

Oct 15, 2012

A glance at information-geometric signal processing

Post @ 16:13:51 | talk

I will be presenting a talk entitled:

A glance at information-geometric signal processing (see 2012-T-MAHI-JGSSP.pdf) at MAHI: Methodological Aspects of Hyperspectral Imaging

Oct 10, 2012

Perspective click'n'drag: Quick area selection in photos

Post @ 12:11:08 | User interface

I have developed a simple UI for selecting areas in photos. See here. Although information geometry is not directly concerned, it can be used during two stages:

  • Segmentation
  • Estimation of robust homographies in the space of two-views homographies.

Sep 06, 2012

Travelling in hyperbolic geometry

Post @ 11:06:24 | fun

Hyperbolic geometry is fascinating, especially in high dims and with complex numbers. Using the many models of hyperbolic geometry, one can choose the best model for a computation purpose. For example, the hyperbolic Voronoi diagram has better be computed in the Klein ball (non-conformal).: Hyperbolic Voronoi diagrams made easy


I did some travelling experiment on travelling myself on the 2D upper plane:


Frank.

Aug 27, 2012

It is some time...

Post @ 16:37:52 | publications

... I did not update the publication list. Here is the journal section.

Aug 21, 2012

Information-geometric diffusion kernel for the multinomial geometry

Post @ 15:24:37 | Fisher-Rao geometry

A very neat paper is the Diffusion Kernels on Statistical Manifolds

@article{DiffusionKernelsStatManifolds-2005,
 author = {Lafferty, John and Lebanon, Guy},
 title = {Diffusion Kernels on Statistical Manifolds},
 journal = JMLR,
 issue_date = {12/1/2005},
 volume = {6},
 month = dec,
 year = {2005},
 issn = {1532-4435},
 pages = {129--163},
 numpages = {35},
 publisher = {JMLR.org},
 bibEntryDate = {2012/8/20},
 bibEntryAuthor={Frank Nielsen}
} 

The authors recall that the Fisher-Rao geometry amounts to the spherical geometry,, and design the multinomial diffusion kernel. Application to text classification using tfidf (weighted bag of words) is presented.
@FrnkNlsn

Aug 17, 2012

Human vs Computer Image Segmentation

Post @ 16:11:23 | computer vision

Segmentation consists in partitioning the image into homogeneous regions intended to represent objects. Human excels in segmenting but computers have the difficult task to solve this problem is a bottom-top approach. In fact, each individual may bring its own segmentation result but computers solve some optimization problem.

  • Graph (MST, normalized cuts)
  • Clustering (k-means, GMMs)
  • Region growing (SRM)
    Nock R., Nielsen, F., (2004). Statistical region merging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11):1452-1458.
    see some ecological applications
  • Watershed
  • etc

An interesting result by Kleinberg is to define a set of three essential properties one good clustering should have, and then prove that there does not exist an objective function to optimize yielding those properties...
J. Kleinberg. An Impossibility Theorem for Clustering. Advances in Neural Information Processing Systems (NIPS) 15, 2002.

Watershed is a segmentation technique that proceeds by filling/detecting basins by dropping water. It takes a greyscale image and manipulate it as a heightmap. To see that it is not trivial task, look at the right column and try to guess the corresponding image on the left column... !!!
Segmentation is a never-ending problem, so what are the next big milestones to focus on?

top-FrankNielsen.jpg oblique-FrankNielsen.jpg

top-Flower.jpg oblique-Flower.jpg


@FrnkNlsn

Previous Logs