May 29, 2009

Java var args in action

Post @ 0:53:56 | Java

I am writing this library for manipulating exponential families, bregman divergences, and so on. I am using Java. Today, I discovered that Java can handle arbitrary number of parameters using the three dots ... syntax as follows:

class VarargsSum
{
    public static double cumul(double... elements)
    {
    double cumul=0.0d;
    for (double el:elements)
        cumul+=el;
    return cumul;   
    }
    
    public static void write(String... records)
    {
    for (String record: records)
      System.out.println(record);
    }
    public static void main(String [] args)
    {
    Double sum=cumul(5.0,4.0,23.0); 
    write("Computational","Information","Geometry",sum.toString());// explicit cast needed
    }   
}

Compiling and running this above code, you will get

Computational
Information
Geometry
32.0

Let me see how to best use this in the library now... Frank.

May 15, 2009

Approximating the smallest enclosing ball of balls

Post @ 7:03:35 | Applet

Approximating the smallest enclosing ball of balls
Here is an applet to play with:

applet
Frank.

May 09, 2009

Quaternions and Sir Hamilton's bridge

Post @ 23:26:37 | algebra

Hamilton's bridge and the birth of quaternions as 4D normed division algebra. It cannot exist for 3D vectors...
See also octonions and 2**n Caley constructions. There are also called hypercomplex numbers.

For the small story, here is the inscription plate:


Here as he walked by on the 16th of October 1843 Sir William Rowan Hamilton in a flash of genius discovered the fundamental formula for quaternion multiplication i2 = j2 = k2 = ijk = -1 & cut it on a stone of this bridge

How much progress has been done in a century! (eg., Lie groups and algebras)
Frank.

May 06, 2009

Learning a kernel in SVM the conformal way

Support vector machines (SVMs) are one of the key tool for classification in machine learning. Suppose you have two sets of high-dimensional points (say, +1 class and -1 class, or red and blue points if you prefer) to separate. The SVM is seeking for the unique hyperplane that separaters the +1-labelled points to the -1-labelled points that maximizes the margin: The distance to the hyperplane. The points touched by translating the hyperplane on the left and right sides are called support vectors. There are in general d+1 such points in dimension d. In practice, red/blue points cannot be linearly separated. The trick is then to use a function f to map these points in a higher dimensional feature space, where they can be linearly separated. It is always possible to do so. Now, we can manipulate these feature points implicitly with a kernel function k(x,x')=f(x).f(x'), where '.' denotes the innerproduct. This is the so-called kernel trick (geometry in a Hilbert space with a Riemannian metric). Choosing the best kernel is difficult. One way is to learn it by bootstrapping the learning machines as follows:
First, learn a SVM and detect the support vectors,
Then adjust the kernel by choosing K(x,x')=D(x)D(x')k(x,x') for a positive function D().
The idea is to enlarge the spatial resolution around the boundary separating surface.
Finally, repeat these steps as much as possible, avoiding overfitting.

All technical details are described in the paper:
S.Wu and S. Amari, Conformal Transformation of Kernel Functions: A Data-Dependent Way to Improve Support Vector Machine Classifiers, Neural Processing Letters, 15, pp. 59-67, 2002.

Frank.

Apr 30, 2009

Non-parametric distances / algorithmic distances

Post @ 19:02:47 | Boosting

There are myriads of formula describing distance between two entities. Usually they are parametric in the sense that a closed-form solution allows one to compute the distance/similarity given the two input objects.

To contrast with these hard-coded distances, there is another approach that consists in desiging non-parametric distances by solving algorithmically a problem. For example, the Earth Mover Distance (EMD) is one such famous distance solved by transport optimization technique.

DistBoost (2004) is another approach inspired from machine learning techniques. It consists in learning the distance from like/dislikes constraints. A binary classifier classes objets into similar/dissimilar classes, say 0 and 1. Moreover, the signed real-valued margin that indicates a confidence level on the prediction can be purposely interpreted as a similarity measure. DistBoost is inspired from the renown boosting technique. The weak learners are solved using a sophisticated constrained EM algorithm.

More details by reading the paper:

Boosting Margin Based Distance Functions for Clustering

The future of data analysis is likely to incorporate these algorithmic distances.
Frank.

Apr 07, 2009

Testing normality of multivariate data

Post @ 23:17:50 | Statistics

K-means is certainly the most useful algorithm for clustering datasets. However, we need to give a prescribed number of clusters. One wat to circumvent this is to use penalty criteria (like AIC or BIC) or use the MDL principle. A simpler solution is to run the Anderson-Darling 1D test on projected data as proned by Hamerly and Elkan (NIPS*2003) Learning the k in k -means

However the test is not fully dimensional. Another interesting approach is based on the minimum spanning tree of the source dataset and a pooled sample (with parameters estimated from the sample mean and sample variance covariance):

A Test to Determine the Multivariate Normality of a Data Set (PAMI 1988).

The null hypothesis testing algorithm runs in quadratic time. It is suprising to see that the paper has not been mentioned more in the literature (closer works with MST and entropy are those of A. Hero). If you give it a try, let me know -:)
Frank.

Apr 04, 2009

Representable divergence: Csiszar and Bregman

Post @ 2:14:34 | Bregman-Csiszar

Csiszar C_f f-divergences preserve information monotonicity, Bregman divergences B_F are canonical divergences of dually flat spaces. These two families intersect only at the Kullback-Leibler divergence.

Consider now using a parameter representation function, say k (strictly monotone). And define B_{F,k}(p||q)=B_F(k(p)||k(q)) then you can obtain alpha and beta-divergences using such an extended Bregman divergence. Furthermore, add an external divergence representation function so that C_{h,f}(p||q)=h(C_f(p||q)). Then you get Renyi, Sharma Mittal and Bhattacharyya divergences using external representations of f-divergences.
Convexity and monotonicity are two puzzling ingredients for function rewriting.

I came across the book: Pardo L: Statistical Inference Based on Divergence Measures. Chapman&Hall, London, 2006.
It is very nice to have a monography focusing on statistical inference wrt. f-divergences.

IG_StatInfBook.jpg

Frank.

Mar 27, 2009

Batched and Incremental k-means

Post @ 17:57:16 | k-means

Since Lloyd's k-means iterative algorithm for hard clustering that stepwisely assign points to their nearest cluster and relocate these centers as centroids, there have been many generalizations, including the Bregman k-means.
This k-means is a batched k-means as points are assigned to their nearest center in a single stage. The optimization is monotone but only converge to a local minimum. To get off this potential local minimum, we can use a single swap that tries to move a point from a cluster to another one. Of course, we better choose the point and target cluster as the one that best decreases the loss function.
This incremental k-means is known and explained as the first variation by Teboulle et al.:

Clustering with Entropy-Like k-Means Algorithms 

paper

I invite you to look at the minimization of the loss function on a gene array expression dataset by combining batched and online k-means.
Frank.

Mar 19, 2009

Out of core nearest neighbor: Fast but good ones!

Post @ 22:04:26 | nearest neighbor

One colleague of mine showed me the recently published paper

NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections (TPAMI)


It is yet another simple tree-based technique (different from spill trees) that involve line projections. Experimental results are reported.

Mar 14, 2009

Dually flat spaces and canonical divergences

Well most of us are familiar with Bregman divergences allows one to extend many familiar algorithms such as the celebrated k-means clustering algorithm. Bregman divergences are known to be the canonical divergences of dually flat spaces. Euclidean geometry is simply self-dually flat (induced by the Legendre self-dual paraboloid function).

However if we consider separable divergence defined by means of a monotonous function called the representation funcrton acting on coordinate systems and a convex function, then we can define generalized canonical divergences as explained in Jun Zhang's seminal paper:
Zhang, J. 2004. Divergence function, duality, and convex analysis. Neural Comput. 16, 1 (Jan. 2004), 159-195. DOI= http://dx.doi.org/10.1162/08997660460734047

<

p> Now Eguchi and Copas' beta divergences (KL for beta=0) also induce a dually flat structure both on the manifold of positive arrays (and also the submanifold of probability vectors).
S. Eguchi and J. Copas (2002): A class of logistic-type discriminant functions, Biometrika, 89, 1-22. (

So let us keep in mind that dually flat spaces is a notion that encompasses Bregman divergences... A philosophical egg and chicken question of the link between geometry vs distances.

Mar 11, 2009

Circular Earth Mover Distance

Post @ 0:34:31 | EMD

Matching feature descriptors in vision is essential for stitching and object recognition among others. Since SIFT is based on discretizing the 360-degree wheel of gradient at different scales, it is better to use circular earth mover distance than a straight EMD.


Experiments are reported Circular Earth Mover?s Distance for the comparison of local features

Mar 02, 2009

Exponential families as Universal density estimators

Post @ 18:32:40 | Exponential families

It is well-known that any smooth density can be well approximated using a mixture of Gaussians. Gaussian distributions belong to the family of exponential families in statistics. There is even a more powerful property of exponential families with so-called rich sufficient statistics.

They take advantage of RKHS (Reproducing Kernel Hilbert Space). So forget the mixture and just consider one exponential family for modeling non-parametric distribution.

Details are in: Exponential families for conditional random fields (2004)

Feb 27, 2009

A unique characterization of alpha-divergences

Post @ 15:11:45 | alpha-divergence

Professor Shun-ichi Amari recently gave a talk at Ecole Polytechnique on information geometry. The latter point of his talk characterizes interestingly the alpha-divergences on positive measures (non-normalized distributions) as the intersection of f-divergences and Bregman divergences.


http://videolectures.net/etvc08_paris/

Of course, on probability measures, only the Kullback-Leibler divergence lies in the common intersection (and its dual).

Feb 26, 2009

Information monotonicity

Post @ 18:10:05 | f-divergence

Csiszar f-divergences have the property of information monotonicity. So take a positive array p with n bins and partition it into m bins P with the probability of falling in a bin being the sum of the probability of the atoms of that bin. Then the f-divergence of (p,q) should always be greater or equal to the f-divergence of the corresponding partitionned arrays. In other words:

f-div(p,q) >= f-div(P,Q)

That means that we can only loose information by aggregating atoms. That is one fairly reasonnable behavior of information.

More axiomatic characterization is detailed in:

Csiszár, Imre. 2008. "Axiomatic Characterizations of Information Measures." Entropy 10, no. 3: 261-273.


Frank.

Feb 24, 2009

New research vista in content-based image/video retrieval (CBIRs)

Post @ 15:55:41 | Theoretical CS

Given user query images, finding quickly similar relevant images is a fundamental challenging task of multimedia search engines with industrial applications to video/image copy detection and object recognition systems. The Internet contains nowadays billions of freely available online images that are currently searched mainly by associated text labels (eg., Google Image http://images.google.com/ ). Although the indexing/retrieval scheme of image databases is somehow quite old, many novel research breakthroughs have been recently attested in the computer vision community. We informally present a few key results below.

Traditionally, database images are indexed using signatures obtained from various feature extractors. First, features of images (histograms, points of interests) are computed and compacted into high-dimensional vectors that represent handles of images. Then given a query image, the signature of the query is computed and similar image signatures are sought out. Finally, an output ranked list is reported after cross-checking geometrically that images carry out indeed semantic similarity (and not only similar signatures). Studying the statistics of (1) natural images, (2) computing short and good feature descriptors, and (3) performing nearest-neighbor queries under the appropriate distance is of primal importance.

We present several recent work along these guidelines:

1/ Carlsson et al. present qualitative topological analysis of the local behavior of the space of natural images.

They consider the toy model of 3 by 3 high-contrast patches. They suprisingly show that these high-contrast patches have the topology of a Klein bottle, and use these findings to develop a compression algorithm based on a Klein bottle dictionary

Carlsson, G., Ishkhanov, T., Silva, V., and Zomorodian, A. 2008. On the Local Behavior of Spaces of Natural Images. Int. J. Comput. Vision 76, 1 (Jan. 2008), 1-12. DOI= http://dx.doi.org/10.1007/s11263-007-0056-x http://comptop.stanford.edu/preprints/mumford.pdf

2/ Discriminating images using feature vectors is an art: Making signature vectors long (=high-dimensional) allows one to better differentiate images at the cost of more time-consuming nearest-signature retrieving.

Making the signatures short allows fast retrieval algorithms but does not discriminate them enough. A recent breakthrough is to use fully automatic machine learning algorithms to learn compact signatures from long signatures. Having short yet discriminative signatures allows one to store locally the fully million-size index. Torralba et al. showed how to convert a real-valued so-called Gist descriptor to a compact binary code with a mere few hundred bits per image. They obtain real-time search performances using millions of Internet images using a single PC and obtain recognition results comparable to the full descriptor.

Small codes and large databases for recognition Torralba, R. Fergus, Y. Weiss. IEEE Computer Vision and Pattern Recognition, June 2008. http://people.csail.mit.edu/torralba/tinyimages/

3/ In order to further improve query search performance, the signature signatures vectors are partitionned into a few clusters.

A celebrated clustering algorithm is Lloyd's iterative k-means method, originally designed for the squared Euclidean distance that yields the center of mass in each cluster. Recently, this algorithm has been generalized to the broadest class of distances, called Bregman divergences by Barnerjee et al. Furthermore, we have shown how to compute the centroids for symmetrized Bregman divergences that allows one to design better information-theoretic clusterings tailored for CBIRs tasks. Clustering is an essential step for designing tree-based nearest neighbor data-structures.

Banerjee, A., Merugu, S., Dhillon, I. S., and Ghosh, J. 2005. Clustering with Bregman Divergences. J. Mach. Learn. Res. 6 (Dec. 2005), 1705-1749.

Nielsen, F., and Nock, R. Sided and Symmetrized Bregman Centroids, IEEE Transactions on Information Theory. IEEE CS Press, 2009 http://www.sonycsl.co.jp/person/nielsen/BregmanCentroids/

Frank Nielsen, Paolo Piro and Michel Barlaud
Tailored Bregman Ball Trees for Effective Nearest Neighbors, European Workshop on Computational Geometry, 2009.

Dec 06, 2008

Videos of the Emerging Trends in Visual Computing (ETVC'08)

Post @ 1:45:25 | Theoretical CS

It is my pleasure to annouce that the videos of ETVC'08 are now available at:

http://videolectures.net/etvc08_paris/

The web site (with abstracts and photo albums) is:

http://www.lix.polytechnique.fr/etvc08/

Aug 29, 2008

Special Issue of AISM on Information Geometry

Post @ 21:43:00 | information geometry

Annals of the Institute of Statistical Mathematics has run a a special Issue onInformation Geometry and Its Applications: http://www.ism.ac.jp/editsec/aism/vol59.html No.1 March 2007

Special Issue:Information Geometry and Its Applications Preface .......... Shun-ichi Amari and Shiro Ikeda (59, 1-2) A modified EM algorithm for mixture models based on Bregman divergence .......... Yu Fujimoto and Noboru Murata (59, 3-25) Exponential statistical manifold .......... Alberto Cena and Giovanni Pistone (59, 27-56) A new algorithm of non-Gaussian component analysis with radial kernel functions .......... Motoaki Kawanabe, Masashi Sugiyama, Gilles Blanchard and Klaus-Robert Müller (59, 57-75) The geometry of proper scoring rules .......... A. P. Dawid (59, 77-93) Extending local mixture models .......... Paul Marriott (59, 95-110) Local mixtures of the exponential distribution .......... K.A. Anaya-Izquierdo and P. K. Marriott (59, 111-134) Bayesian prediction based on a class of shrinkage priors for location-scale models .......... Fumiyasu Komaki (59, 135-146) Uncertainty principle and quantum Fisher information .......... Paolo Gibilisco and Tommaso Isola (59, 147-159) A note on curvature of \alpha-connections of a statistical manifold .......... Jun Zhang (59, 161-170)

Aug 18, 2008

Getting random multivariate Gaussian distribution

To get a random Gaussian N(mu,Sigma), sample the mu uniformly from [0,1] (or N(0,1)) and sample the symmetric positive definite matrix from a Wishart distribution: Sample each entry of a matrix A from a Normal distribution N(0,1), and set SIGMA=lambda AA^T where lambda is a uniform random number that controls the size of the covariance matrice.

Aug 14, 2008

Session on information geometry at Ecole Polytechnique colloquium ETVC08

Post @ 13:57:30 | Announcement

November 18th-20th, 2008, Paris.

Registration is now open (and free)

ETVC08

http://www.lix.polytechnique.fr/Labo/Frank.Nielsen/ETVC08/

Aug 12, 2008

Albert ROBIDA, pioneer and visionary of music in the 19th century

Post @ 17:28:33 | popular science

Last February, I was strikingly introduced with the truly pioneer work of visionary artist Albert ROBIDA (http://www.robida.info/). In short, Robida painted over 60.000 drawings and illustrated well over 200 books in his artist life. Albert Robida envisionned not only many IT products well ahead of his time, but he was also already awared and convinced of ecology problems. To give a remarkable example of Robida's visionary talent, let us point out that Robida imagined and painted a deviced called the phonoscope in 1883. The phonoscope is simply the precursor of the 21st century home theater (see imghttp://www.robida.info/images/visionnaire/journal_telephonoscopique.jpg). It is related to Edison's telephonoscope , also disclosed in an invention report in 1879 (imghttp://www.flickr.com/photos/seriykotik/208841133/). Perhaps, more strikingly, Robida envisionned the e-learning application with a remote teacher giving a lecture for a student at home: imghttp://www.robida.info/images/visionnaire/cours.jpg However, for most of the people, the most striking drawing we saw was the ancestor of the Mp3 Walkman at imghttp://www.robida.info/images/visionnaire/phono-operagraphe.jpg In this illustation, you can observe a shepherd (a walk man) listening to music in mountains. In other words, the music devices we carry everyday was already imagined two centuries ago. If such a device was already imagined, was about recording music? Very recent work carried out in March 2008 by American audio historians recovered the first recorded sound by little-known
parisian Edouard-Leon Scott de Martinville on April 9, 1860: Au Clair de la Lune, a French folk song (an ex. Scott's 1857 drawing of a phonautographic recording session was discloded in his patent paperwork registered at the french intellectual property office (INPI). More details of the ingenious process that consists in printing the sound on a sheet of paper is explained at linkhttp://www.firstsounds.org/ Note that the device could not play sounds back. (Edison used a sheet of tinfoil to record in 1927 ?Mary had a little lamb", see http://history.sandiego.edu/gen/recording/mary.html)

To conclude, let us mention that Albert Robida also wrote and illustrated the book "The Electric Life" where he depicts life in the 20th century and warns mankind of the potential danger of using non-controlled technology. How timely! We're all now fully aware and have imminently to face this issue.

Frank Nielsen.

A few links:.

A short biography:.

http://en.wikipedia.org/wiki/Albert_Robida

Robida's "Twentieth Century" book:.

http://www.amazon.com/Twentieth-Century-Classics-Science-Fiction/dp/0819566802/ref=sr_1_1?ie=UTF8&s=books&qid=1208178705&sr=1-1

History of the first recorded sound:.
http://www.firstsounds.org/

Aug 08, 2008

3D Bregman balls (lithography)

Post @ 16:43:54 | Bregman

Bregmanballs-printed.jpg

Aug 07, 2008

3D Bregman balls printed

More than a year ago, I printed 3D Bregman balls using a lithography process. I took may hours to print these shapes...

Generalized Kullback-Leibler (positive measures):
kl.jpg

Itakura-Saito (Burg entropy):
is.jpg

Logistic loss:
ll.jpg
To create the STL files, you need to discretize the surface of theses balls by walking on their geodesics passing through the centers. A bisection search does this to stop a more or less the radius.

Simplifying mixtures of Gaussians (MoG, GMM)

Since most pdfs can be equivalently expressed as a Gaussian mixture model, it is tempting to simplify them. The measure taken between two GMMs (one with many components, and the other with a reduced number) is the Kullback-Leibler relative entropy (=cross entropy-entropy). The problem is to decide whether we should minimize KL(GMMsource||GMM simplified) or the converse KL(GMM simplified||GMMsource) or the symmetrized KL (Jensen-Shannon, in green). Here are a few experiments carried out when simplidying a GMM to a single normal:
resultSylvain.png
result07-08-2008-05-09-23.png
result07-08-2008-10-00-14.png I let you guess whether the blue/red is the left/right or the converse... -:)

Aug 06, 2008

Relative entropy 3D ball

Post @ 11:49:03 | KullKullback-Leibler

RelativeEntropyBall.jpg Kullback-Leibler ball for unnormalized positive distributions.

Aug 05, 2008

Voronoi diagram and quantum Voronoi diagram

Post @ 19:54:21 | Voronoi

Using s/w volume rendering,we can easily visualize Voronoi structures in 3D. The picture below is a screen capture of an interactive s/w using GPU shaders. We can similarly visualize the quantum voronoi diagrams inside the Bloch ball...
VorL2.bmp VorL2.jpg

Jul 31, 2008

Bregman Itakura-Saito ball shown in 3D

Post @ 19:03:34 | Bregman

Bregman balls in 3D

itakurasaito3d.jpg

Jul 24, 2008

Robida: A precursor of electric life...

Post @ 15:08:45 | popular science

... Robida ? 19?????? 20???????????????????????????? 60,000??????? 200??????????????????????????????????????????????? (http://www.robida.info/)????????????? IT ????????????????????????????????????????????????

Robida ????????????????????????1883????????? ? "phonoscope" ????????????? "phonoscope" ?????????????????????????????????? 1879?? Edison ? "telephonoscope" ?????????????????????????????????????????????????????????????????Robida ??? "e-?????" ???????????? ?????????????????????????????????????? ... WALKMAN ??????????????????? (a walk man) ?????????????????????????????????????????????????????????????????????????? ... ?????????????????????????????????????????????????? ????????????????????? (2008?3?) ??????????????? Edouard-Leon Scott de Martinville ? 1860 ?4?9???????????? Au Clair de la Lune ?????????? ??????????Scott ???? 1857?????? phonautographic ????????????????????? (INPI) ?????????????????????????????????????????????????????? ?????????? ?????????????????(??????????????) ??????????????????? (???? Edison ?1927????????????????????????????)

??? Albert Robida ????????? "The Electric Life" ??????????????? 20???????????????????????????????????????????????????? ...

Frank NIELSEN, excerpts 2008.

Dec 25, 2007

Error Analysis of a Numerical Calculation about One-qubit Quantum Channel Capacity

The paper:

Error Analysis of a Numerical Calculation about One-qubit Quantum Channel Capacity (ISVD'07)

investigates the numerical error in computing the Holevo channel capacity using the furthest Voronoi diagram wrt. to the von Neuman quantum divergence. They show that the error is in O(1/eps) for sampling in the latitude-longitude 1/eps points.

This algorithm allows us to study, e.g., the additivity conjecture of quantum channels.

It is a nice aspect of quantum information geometry derived from traditional computational geometry. Interestingly, the Legendre transformation for 1-qubit represented on the Bloch sphere is explicited too.

Dec 12, 2007

Ranking of R &D centers

Post @ 12:35:31 | R&D

Good news from the CNRS after the Nobel prize of Albert Fert, it has been evaluated at number 6 worldwide.

Read the ranking

Previous Logs