May 29, 2009
Java var args in action
May 15, 2009
May 09, 2009
Quaternions and Sir Hamilton's bridge
Hamilton's bridge and the birth of quaternions as 4D normed division algebra.
It cannot exist for 3D vectors...
See also octonions and 2**n Caley constructions.
There are also called hypercomplex numbers.
For the small story, here is the inscription plate:
Here as he walked by on the 16th of October 1843 Sir William Rowan Hamilton in a flash of genius discovered the fundamental formula for quaternion multiplication i2 = j2 = k2 = ijk = -1 & cut it on a stone of this bridge
How much progress has been done in a century! (eg., Lie groups and algebras)
Frank.
May 06, 2009
Learning a kernel in SVM the conformal way
Support vector machines (SVMs) are one of the key tool for classification in machine learning.
Suppose you have two sets of high-dimensional points (say, +1 class and -1 class, or red and blue points if you prefer) to separate.
The SVM is seeking for the unique hyperplane that separaters the +1-labelled points to the -1-labelled points that maximizes the margin: The distance to the hyperplane.
The points touched by translating the hyperplane on the left and right sides are called support vectors.
There are in general d+1 such points in dimension d.
In practice, red/blue points cannot be linearly separated.
The trick is then to use a function f to map these points in a higher dimensional feature space, where they can be linearly separated.
It is always possible to do so. Now, we can manipulate these feature points implicitly with a kernel function k(x,x')=f(x).f(x'), where '.' denotes the innerproduct.
This is the so-called kernel trick (geometry in a Hilbert space with a Riemannian metric).
Choosing the best kernel is difficult. One way is to learn it by bootstrapping the learning machines as follows:
First, learn a SVM and detect the support vectors,
Then adjust the kernel by choosing K(x,x')=D(x)D(x')k(x,x') for a positive function D().
The idea is to enlarge the spatial resolution around the boundary separating surface.
Finally, repeat these steps as much as possible, avoiding overfitting.
All technical details are described in the paper:
S.Wu and S. Amari, Conformal Transformation of Kernel Functions: A Data-Dependent
Way to Improve Support Vector Machine Classifiers, Neural Processing Letters, 15, pp.
59-67, 2002.
Frank.
Apr 30, 2009
Non-parametric distances / algorithmic distances
There are myriads of formula describing distance between two entities. Usually they are parametric in the sense that a closed-form solution allows one to compute the distance/similarity given the two input objects.
To contrast with these hard-coded distances, there is another approach that consists in desiging non-parametric distances by solving algorithmically a problem. For example, the Earth Mover Distance (EMD) is one such famous distance solved by transport optimization technique.
DistBoost (2004) is another approach inspired from machine learning techniques.
It consists in learning the distance from like/dislikes constraints.
A binary classifier classes objets into similar/dissimilar classes, say 0 and 1.
Moreover, the signed real-valued margin that indicates a confidence level on the prediction can be purposely interpreted as a similarity measure. DistBoost is inspired from the renown boosting technique. The weak learners are solved using a sophisticated constrained EM algorithm.
More details by reading the paper:
Boosting Margin Based Distance Functions for Clustering
The future of data analysis is likely to incorporate these algorithmic distances.
Frank.
Apr 07, 2009
Testing normality of multivariate data
K-means is certainly the most useful algorithm for clustering datasets. However, we need to give a prescribed number of clusters. One wat to circumvent this is to use penalty criteria (like AIC or BIC) or use the MDL principle. A simpler solution is to run the Anderson-Darling 1D test on projected data as proned by Hamerly and Elkan (NIPS*2003) Learning the k in k -means
However the test is not fully dimensional. Another interesting approach is based on the minimum spanning tree of the source dataset and a pooled sample (with parameters estimated from the sample mean and sample variance covariance):
A Test to Determine the Multivariate Normality of a Data Set (PAMI 1988).
The null hypothesis testing algorithm runs in quadratic time.
It is suprising to see that the paper has not been mentioned more in the literature (closer works with MST and entropy are those of A. Hero). If you give it a try, let me know -:)
Frank.
Apr 04, 2009
Representable divergence: Csiszar and Bregman
Csiszar C_f f-divergences preserve information monotonicity, Bregman divergences B_F are canonical divergences of dually flat spaces. These two families intersect only at the Kullback-Leibler divergence.
Consider now using a parameter representation function, say k (strictly monotone).
And define B_{F,k}(p||q)=B_F(k(p)||k(q)) then you can obtain alpha and beta-divergences using such an extended Bregman divergence. Furthermore, add an external divergence representation function so that C_{h,f}(p||q)=h(C_f(p||q)).
Then you get Renyi, Sharma Mittal and Bhattacharyya divergences using external representations of f-divergences.
Convexity and monotonicity are two puzzling ingredients for function rewriting.
I came across the book:
Pardo L: Statistical Inference Based on Divergence Measures. Chapman&Hall, London, 2006.
It is very nice to have a monography focusing on statistical inference wrt. f-divergences.
Frank.
Mar 27, 2009
Batched and Incremental k-means
Since Lloyd's k-means iterative algorithm for hard clustering that stepwisely assign points to their nearest cluster and relocate these centers as centroids, there have been many generalizations, including the Bregman k-means.
This k-means is a batched k-means as points are assigned to their nearest center in a single stage.
The optimization is monotone but only converge to a local minimum.
To get off this potential local minimum, we can use a single swap that tries to move a point from a cluster to another one.
Of course, we better choose the point and target cluster as the one that best decreases the loss function.
This incremental k-means is known and explained as the first variation by Teboulle et al.:
Clustering with Entropy-Like k-Means Algorithms
I invite you to look at the minimization of the loss function on a gene array expression dataset by combining batched and online k-means.
Frank.
Mar 19, 2009
Out of core nearest neighbor: Fast but good ones!
One colleague of mine showed me the recently published paper
It is yet another simple tree-based technique (different from spill trees) that involve line projections.
Experimental results are reported.
Mar 14, 2009
Dually flat spaces and canonical divergences
Well most of us are familiar with Bregman divergences allows one to extend many familiar algorithms such as the celebrated k-means clustering algorithm. Bregman divergences are known to be the canonical divergences of dually flat spaces. Euclidean geometry is simply self-dually flat (induced by the Legendre self-dual paraboloid function).
However if we consider separable divergence defined by means of a monotonous function called the representation funcrton acting on coordinate systems and a convex function, then we can define generalized canonical divergences as explained in Jun Zhang's seminal paper:
Zhang, J. 2004. Divergence function, duality, and convex analysis. Neural Comput. 16, 1 (Jan. 2004), 159-195. DOI= http://dx.doi.org/10.1162/08997660460734047
<
p>
Now Eguchi and Copas' beta divergences (KL for beta=0) also induce a dually flat structure both on the manifold of positive arrays (and also the submanifold of probability vectors).
S. Eguchi and J. Copas (2002): A class of logistic-type discriminant functions, Biometrika, 89, 1-22. (
>
So let us keep in mind that dually flat spaces is a notion that encompasses Bregman divergences... A philosophical egg and chicken question of the link between geometry vs distances.
Mar 11, 2009
Circular Earth Mover Distance
Matching feature descriptors in vision is essential for stitching and object recognition among others. Since SIFT is based on discretizing the 360-degree wheel of gradient at different scales, it is better to use circular earth mover distance than a straight EMD.
Experiments are reported Circular Earth Mover?s Distance for the comparison of local features
Mar 02, 2009
Exponential families as Universal density estimators
It is well-known that any smooth density can be well approximated using a mixture of Gaussians. Gaussian distributions belong to the family of exponential families in statistics. There is even a more powerful property of exponential families with so-called rich sufficient statistics.
They take advantage of RKHS (Reproducing Kernel Hilbert Space). So forget the mixture and just consider one exponential family for modeling non-parametric distribution.
Details are in: Exponential families for conditional random fields (2004)
Feb 27, 2009
A unique characterization of alpha-divergences
Professor Shun-ichi Amari recently gave a talk at Ecole Polytechnique on information geometry. The latter point of his talk characterizes interestingly the alpha-divergences on positive measures (non-normalized distributions) as the intersection of f-divergences and Bregman divergences.
http://videolectures.net/etvc08_paris/
Of course, on probability measures, only the Kullback-Leibler divergence lies in the common intersection (and its dual).
Feb 26, 2009
Information monotonicity
Csiszar f-divergences have the property of information monotonicity. So take a positive array p with n bins and partition it into m bins P with the probability of falling in a bin being the sum of the probability of the atoms of that bin. Then the f-divergence of (p,q) should always be greater or equal to the f-divergence of the corresponding partitionned arrays. In other words:
f-div(p,q) >= f-div(P,Q)
That means that we can only loose information by aggregating atoms. That is one fairly reasonnable behavior of information.
More axiomatic characterization is detailed in:
Csiszár, Imre. 2008. "Axiomatic Characterizations of Information Measures." Entropy 10, no. 3: 261-273.
Frank.
Feb 24, 2009
New research vista in content-based image/video retrieval (CBIRs)
Given user query images, finding quickly similar relevant images is a fundamental challenging task of multimedia search engines with industrial applications to video/image copy detection and object recognition systems. The Internet contains nowadays billions of freely available online images that are currently searched mainly by associated text labels (eg., Google Image http://images.google.com/ ). Although the indexing/retrieval scheme of image databases is somehow quite old, many novel research breakthroughs have been recently attested in the computer vision community. We informally present a few key results below.
Traditionally, database images are indexed using signatures obtained from various feature extractors. First, features of images (histograms, points of interests) are computed and compacted into high-dimensional vectors that represent handles of images. Then given a query image, the signature of the query is computed and similar image signatures are sought out. Finally, an output ranked list is reported after cross-checking geometrically that images carry out indeed semantic similarity (and not only similar signatures). Studying the statistics of (1) natural images, (2) computing short and good feature descriptors, and (3) performing nearest-neighbor queries under the appropriate distance is of primal importance.
We present several recent work along these guidelines:
1/ Carlsson et al. present qualitative topological analysis of the local behavior of the space of natural images.
They consider the toy model of 3 by 3 high-contrast patches. They suprisingly show that these high-contrast patches have the topology of a Klein bottle, and use these findings to develop a compression algorithm based on a Klein bottle dictionary
Carlsson, G., Ishkhanov, T., Silva, V., and Zomorodian, A. 2008. On the Local Behavior of Spaces of Natural Images. Int. J. Comput. Vision 76, 1 (Jan. 2008), 1-12. DOI= http://dx.doi.org/10.1007/s11263-007-0056-x http://comptop.stanford.edu/preprints/mumford.pdf
2/ Discriminating images using feature vectors is an art: Making signature vectors long (=high-dimensional) allows one to better differentiate images at the cost of more time-consuming nearest-signature retrieving.
Making the signatures short allows fast retrieval algorithms but does not discriminate them enough. A recent breakthrough is to use fully automatic machine learning algorithms to learn compact signatures from long signatures. Having short yet discriminative signatures allows one to store locally the fully million-size index. Torralba et al. showed how to convert a real-valued so-called Gist descriptor to a compact binary code with a mere few hundred bits per image. They obtain real-time search performances using millions of Internet images using a single PC and obtain recognition results comparable to the full descriptor.
Small codes and large databases for recognition Torralba, R. Fergus, Y. Weiss. IEEE Computer Vision and Pattern Recognition, June 2008. http://people.csail.mit.edu/torralba/tinyimages/
3/ In order to further improve query search performance, the signature signatures vectors are partitionned into a few clusters.
A celebrated clustering algorithm is Lloyd's iterative k-means method, originally designed for the squared Euclidean distance that yields the center of mass in each cluster. Recently, this algorithm has been generalized to the broadest class of distances, called Bregman divergences by Barnerjee et al. Furthermore, we have shown how to compute the centroids for symmetrized Bregman divergences that allows one to design better information-theoretic clusterings tailored for CBIRs tasks. Clustering is an essential step for designing tree-based nearest neighbor data-structures.
Banerjee, A., Merugu, S., Dhillon, I. S., and Ghosh, J. 2005. Clustering with Bregman Divergences. J. Mach. Learn. Res. 6 (Dec. 2005), 1705-1749.
Nielsen, F., and Nock, R. Sided and Symmetrized Bregman Centroids, IEEE Transactions on Information Theory. IEEE CS Press, 2009 http://www.sonycsl.co.jp/person/nielsen/BregmanCentroids/
Frank Nielsen, Paolo Piro and Michel Barlaud
Tailored Bregman Ball Trees for Effective Nearest Neighbors, European Workshop on Computational Geometry, 2009.
Dec 06, 2008
Videos of the Emerging Trends in Visual Computing (ETVC'08)
It is my pleasure to annouce that the videos of ETVC'08 are now available at:
http://videolectures.net/etvc08_paris/
The web site (with abstracts and photo albums) is:
Aug 29, 2008
Special Issue of AISM on Information Geometry
Annals of the Institute of Statistical Mathematics has run a a special Issue onInformation Geometry and Its Applications: http://www.ism.ac.jp/editsec/aism/vol59.html No.1 March 2007
Special Issue:Information Geometry and Its Applications Preface .......... Shun-ichi Amari and Shiro Ikeda (59, 1-2) A modified EM algorithm for mixture models based on Bregman divergence .......... Yu Fujimoto and Noboru Murata (59, 3-25) Exponential statistical manifold .......... Alberto Cena and Giovanni Pistone (59, 27-56) A new algorithm of non-Gaussian component analysis with radial kernel functions .......... Motoaki Kawanabe, Masashi Sugiyama, Gilles Blanchard and Klaus-Robert Müller (59, 57-75) The geometry of proper scoring rules .......... A. P. Dawid (59, 77-93) Extending local mixture models .......... Paul Marriott (59, 95-110) Local mixtures of the exponential distribution .......... K.A. Anaya-Izquierdo and P. K. Marriott (59, 111-134) Bayesian prediction based on a class of shrinkage priors for location-scale models .......... Fumiyasu Komaki (59, 135-146) Uncertainty principle and quantum Fisher information .......... Paolo Gibilisco and Tommaso Isola (59, 147-159) A note on curvature of \alpha-connections of a statistical manifold .......... Jun Zhang (59, 161-170)
Aug 18, 2008
Getting random multivariate Gaussian distribution
To get a random Gaussian N(mu,Sigma), sample the mu uniformly from [0,1] (or N(0,1)) and sample the symmetric positive definite matrix from a Wishart distribution: Sample each entry of a matrix A from a Normal distribution N(0,1), and set SIGMA=lambda AA^T where lambda is a uniform random number that controls the size of the covariance matrice.
Aug 14, 2008
Session on information geometry at Ecole Polytechnique colloquium ETVC08
November 18th-20th, 2008, Paris.
Registration is now open (and free)
ETVC08
http://www.lix.polytechnique.fr/Labo/Frank.Nielsen/ETVC08/
Aug 12, 2008
Albert ROBIDA, pioneer and visionary of music in the 19th century
Last February, I was strikingly introduced with the truly pioneer work of visionary artist Albert ROBIDA
(http://www.robida.info/). In short, Robida painted over 60.000 drawings and illustrated well over 200 books in his artist life.
Albert Robida envisionned not only many IT products well ahead of his time, but he was also already awared and convinced of ecology problems.
To give a remarkable example of Robida's visionary talent, let us point out that Robida imagined and painted a deviced called the phonoscope in 1883.
The
phonoscope
is simply the precursor of the 21st century home theater (see imghttp://www.robida.info/images/visionnaire/journal_telephonoscopique.jpg).
It is related to Edison's
telephonoscope
, also disclosed in an invention report in 1879
(imghttp://www.flickr.com/photos/seriykotik/208841133/). Perhaps, more strikingly, Robida envisionned the
e-learning
application with a remote teacher giving a lecture for a student at home:
imghttp://www.robida.info/images/visionnaire/cours.jpg
However, for most of the people, the most striking drawing we saw was the ancestor of the Mp3 Walkman at
imghttp://www.robida.info/images/visionnaire/phono-operagraphe.jpg
In this illustation, you can observe a shepherd (a walk man) listening to music in mountains.
In other words, the music devices we carry everyday was already imagined two centuries ago.
If such a device was already imagined, was about recording music?
Very recent work carried out in March 2008 by American audio historians recovered the first recorded sound by little-known
parisian Edouard-Leon Scott de Martinville on April 9, 1860: Au Clair de la Lune, a French folk song (an ex.
Scott's 1857 drawing of a phonautographic recording session was discloded in his patent paperwork registered at the french intellectual property office (INPI).
More details of the ingenious process that consists in printing the sound on a sheet of paper is explained at linkhttp://www.firstsounds.org/
Note that the device could not play sounds back.
(Edison used a sheet of tinfoil to record in 1927 ?Mary had a little lamb", see http://history.sandiego.edu/gen/recording/mary.html)
To conclude, let us mention that Albert Robida also wrote and illustrated the book "The Electric Life"
where he depicts life in the 20th century and warns mankind of the potential danger of using non-controlled technology.
How timely! We're all now fully aware and have imminently to face this issue.
Frank Nielsen.
A few links:.
A short biography:.
http://en.wikipedia.org/wiki/Albert_Robida
Robida's "Twentieth Century" book:.
http://www.amazon.com/Twentieth-Century-Classics-Science-Fiction/dp/0819566802/ref=sr_1_1?ie=UTF8&s=books&qid=1208178705&sr=1-1
History of the first recorded sound:.
http://www.firstsounds.org/
Aug 08, 2008
Aug 07, 2008
3D Bregman balls printed
More than a year ago, I printed 3D Bregman balls using a lithography process. I took may hours to print these shapes...
Generalized Kullback-Leibler (positive measures):
Itakura-Saito (Burg entropy):
Logistic loss:
To create the STL files, you need to discretize the surface of theses balls by walking on their geodesics passing through the centers. A bisection search does this to stop a more or less the radius.
Simplifying mixtures of Gaussians (MoG, GMM)
Since most pdfs can be equivalently expressed as a Gaussian mixture model, it is tempting to simplify them.
The measure taken between two GMMs (one with many components, and the other with a reduced number) is the Kullback-Leibler relative entropy (=cross entropy-entropy). The problem is to decide whether we should minimize KL(GMMsource||GMM simplified) or the converse KL(GMM simplified||GMMsource) or the symmetrized KL (Jensen-Shannon, in green).
Here are a few experiments carried out when simplidying a GMM to a single normal:
I let you guess whether the blue/red is the left/right or the converse... -:)
Aug 06, 2008
Relative entropy 3D ball
Kullback-Leibler ball for unnormalized positive distributions.
Aug 05, 2008
Voronoi diagram and quantum Voronoi diagram
Using s/w volume rendering,we can easily visualize Voronoi structures in 3D.
The picture below is a screen capture of an interactive s/w using GPU shaders.
We can similarly visualize the quantum voronoi diagrams inside the Bloch ball...
VorL2.bmp
Jul 31, 2008
Jul 24, 2008
Robida: A precursor of electric life...
... Robida ? 19?????? 20???????????????????????????? 60,000??????? 200??????????????????????????????????????????????? (http://www.robida.info/)????????????? IT ????????????????????????????????????????????????
Robida ????????????????????????1883????????? ? "phonoscope" ????????????? "phonoscope" ?????????????????????????????????? 1879?? Edison ? "telephonoscope" ?????????????????????????????????????????????????????????????????Robida ??? "e-?????" ???????????? ?????????????????????????????????????? ... WALKMAN ??????????????????? (a walk man) ?????????????????????????????????????????????????????????????????????????? ... ?????????????????????????????????????????????????? ????????????????????? (2008?3?) ??????????????? Edouard-Leon Scott de Martinville ? 1860 ?4?9???????????? Au Clair de la Lune ?????????? ??????????Scott ???? 1857?????? phonautographic ????????????????????? (INPI) ?????????????????????????????????????????????????????? ?????????? ?????????????????(??????????????) ??????????????????? (???? Edison ?1927????????????????????????????)
??? Albert Robida ????????? "The Electric Life" ??????????????? 20???????????????????????????????????????????????????? ...
Frank NIELSEN, excerpts 2008.
May 24, 2008
Colloquium on Visual Computing at Ecole Polytechnique
November 18th-20th, 2008.
Registration is now open (and free)
ETVC08
Jan 04, 2008
Dec 25, 2007
Error Analysis of a Numerical Calculation about One-qubit Quantum Channel Capacity
The paper:
Error Analysis of a Numerical Calculation about One-qubit Quantum Channel Capacity (ISVD'07)
investigates the numerical error in computing the Holevo channel capacity using the furthest Voronoi diagram wrt. to the von Neuman quantum divergence. They show that the error is in O(1/eps) for sampling in the latitude-longitude 1/eps points.
This algorithm allows us to study, e.g., the additivity conjecture of quantum channels.
It is a nice aspect of quantum information geometry derived from traditional computational geometry. Interestingly, the Legendre transformation for 1-qubit represented on the Bloch sphere is explicited too.
Dec 12, 2007
Ranking of R &D centers
Good news from the CNRS after the Nobel prize of Albert Fert, it has been evaluated at number 6 worldwide.
I am writing this library for manipulating exponential families, bregman divergences, and so on. I am using Java. Today, I discovered that Java can handle arbitrary number of parameters using the three dots ... syntax as follows:
class VarargsSum { public static double cumul(double... elements) { double cumul=0.0d; for (double el:elements) cumul+=el; return cumul; } public static void write(String... records) { for (String record: records) System.out.println(record); } public static void main(String [] args) { Double sum=cumul(5.0,4.0,23.0); write("Computational","Information","Geometry",sum.toString());// explicit cast needed } }Compiling and running this above code, you will get
Let me see how to best use this in the library now... Frank.