Jan 26, 2010

Mixture of beta distributions

Post @ 18:28:48 | Mixtures

Mixtures of beta distributions are useful in various areas of computational science for modeling underlying distributions of datasets. Although beta distributions are not the most famous mixture models (leaving the place to unthroned Gaussian mixture models or GMMs for short) they are convenient for a number of reasons. The two parameters alpha and beta give flexibility for modeling various shapes, and they are prior conjugate of Bernoulli distributions for Bayesian inference.

In bioinformatics, beta mixtures have been proven useful for analyzing gene expressions. Either the same gene observed under different modalities (say, different maker microarrays or radioactivity labeled DNAc) or for extracting pathways (co-expressed genes). The basic feature is the correlation number of pairwise expressions. Modeling those correlation coefficient distribution allows one to fit beta mixtures with two components: The similar and dissimilar without using a priori threshold (often set arbitrarily to 0.5).
A standard EM algorithm using numerical optimization let us fit beta mixture. What is the best method and best tool for doing that? If there are biologists surfing here around, let me know please.
The reference of the paper is:

Bioinformatics. 2005 May 1;21(9):2118-22. Epub 2005 Feb 15.
Applications of beta-mixture models in bioinformatics.
Ji Y, Wu C, Liu P, Wang J, Coombes KR.

See link
for retrieving the paper.

See also our open source library for handling mixture models: jMEF

Frank.

Jan 20, 2010

Experiencing with centroidal Voronoi tesselations

Post @ 15:02:50 | Voronoi

One cool application of centroidal Voronoi tesselations is stippling. Stippling is an artistic rendering of images using the grey intensities as the underlying point density for uniform sampling. Here is a (preliminary) result (of myself):

stipplingfrank600.png
As you can see there is some artifacts if we proceed the straight way. Hence the many papers dealing with CVTs and noise analysis
Frank.

Jan 07, 2010

Variational distance is the only metric f-divergence

Post @ 22:40:41 | f-divergence

A quite interesting property is that the variational divergence (a f-divergence) and its multiples are the only metric f-divergences. This was recently shown
Confliction of the Convexity and Metric Properties in f-Divergences (IEICE 2007).

If instead, we assume f strictly positive concave (and self-dual f(x)=x f(1/x)) then the formula of f-divergences yields metrics.

Frank.

Dec 23, 2009

Synthetical information geometry (versus analytical geometry)

Post @ 7:54:11 | information geometry

Let us give some examples of information manifolds:

  • Statistical manifolds (parametric distributions),
  • Neural manifolds (Boltzmann machines with fixed topology, i.e., number of nodes),
  • ARMA(p,q) time-series manifolds (e-flat=-1-flat)

Strictly speaking, geometrizing information-theoretic problems does not provide a more powerful framework in theory. This is because synthetical and analytical geometries are equivalent. Informally, that means that we can do geometry by algebraic equations.

However, geometrizing problems help grab intuition on the problem at hand. Geometry also yields novel notions to mathematical theories. For example, let us cite the two curvature notions in statistics: exponential and mixture curvatures emanating from conjugate connections. So although synthetical geometry provides the same power as analytical geometry, the third-order asymptotic theory of statistics has been obtained so far only from synthetical information geometry.

Dual differential geometries are also useful to tackle information-theoretic problems such as

  • Multiterminal problems met in information theory,
  • Linear programming problems (e.g., continuous Karmarkar inner method walking along the m-geodesic),
  • Clustering (negative entropy and dual Legendre log-normalizer conjugate for soft/hard clustering).


Frank.

Dec 21, 2009

Beta divergence as representational Bregman divergences

Post @ 22:53:17 | Beta divergences

In the paper The dual Voronoi diagrams with respect to representational Bregman divergences (ISVD 2009), International Symposium on Voronoi Diagrams, we show that Beta divergences is a (representational) Bregman divergence with

  • Beta=1 -> Kullback-Leibler
  • Beta=0 -> Itakura-Saito
  • Beta=2 -> squared Euclidean distance

In the paper, we derive formula for the beta left and right-sided centroids. The program RepresentationalBetaBregman.java shows this equivalence (up to numerical errors).

Shows that beta divergences can be obtained from representational Bregman divergences
beta-div=0.0060607537896566754  equals Bregman rep. div=0.006060753789656717
beta-div=0.03596836005227946    equals Bregman rep. div=0.03596836005227946
beta-div=0.03786639961675385    equals Bregman rep. div=0.03786639961675385
beta-div=0.015356733711556145   equals Bregman rep. div=0.015356733711556173
beta-div=0.16665973512136045    equals Bregman rep. div=0.16665973512136045
beta-div=0.006143185064308276   equals Bregman rep. div=0.006143185064308207
beta-div=0.012777128199946086   equals Bregman rep. div=0.012777128199946072
beta-div=2.42453303134018E-4    equals Bregman rep. div=2.4245330313402494E-4
beta-div=0.07962156613977964    equals Bregman rep. div=0.07962156613977961
beta-div=2.6549301732092453E-4  equals Bregman rep. div=2.6549301732092974E-4
Press any key to continue...

Frank.

Dec 20, 2009

Population space and Rao's distance

Post @ 2:00:32 | Rao's distance

The seminal paper of Rao written before he joined Cambridge for his PhD is available online at:

Breakthroughs in Statistics
page 235 is a reprint of:
Rao, Calyampudi (1945). "Information and the accuracy attainable in the estimation of statistical parameters". Bulletin of the Calcutta Mathematical Society 37: 81?89.

There we find three essential results:

  • Cramer-Rao bound

  • Population space and riemannian geometry using the Fisher information metric as the quadratic differential form.

  • Test of significance (and classification)

Information geometry has since then spreaded, with the work of Chentsov on alpha-connections and its investigation by Amari. Historically, the space of distributions, was called the "population space".

The Rao distance for 1D normals is also given.

Frank.

Dec 19, 2009

Bhattacharyya metric for multivariate gaussians

Post @ 21:20:21 | Bhattacharyya

The Bhattacharyya metric for multivariate gaussians is given by

The geometry is spherical on renormalized density functions.

Bhattacharyya with Mahalanobis distances were precursors of the Fisher-Rao Riemannian distances.

Frank.

Dec 17, 2009

alpha-means with respect to alpha-divergences

Post @ 22:39:23 | alpha-divergences

In this note alphadivergencemeans.pdf, we summarize the following work

  • Shun-ichi Amari, Integration of Stochastic Models by Minimizing \alpha-Divergence, Neural Computation (NECO), (19)10:2780-2796, October 2007.

  • F. Nielsen and R. Nock, The dual Voronoi diagrams with respect to representational Bregman divergences, International Symposium on Voronoi Diagrams (ISVD), June 2009.

  • F. Nielsen and R. Nock, Sided and Symmetrized Bregman Centroids, IEEE transactions on information theory (2009), vol. 55, no. 6, pp. 2882-2904

Dec 12, 2009

jMEF in Matlab: Mixture of Exponential Families

Post @ 21:49:39 | Exponential families

The jMEF library is easily interfaced with Matlab. You can compute Gaussian Mixture Models (GMMs) and manipulate them easily now in Matlab.

jMEF in Matlab


Source GMM
Sample points from GMM

GMM obtained using Bregman soft clustering (expectation maximization)

Dec 11, 2009

Alpha divergences as representational Bregman divergences

Post @ 18:09:37 | Bregman

In my paper at ISVD 2009 on

The dual Voronoi diagrams with respect to representational Bregman divergences

slides

I show that by using a representational function, we can obtain alpha-divergences as a representational Bregman divergences. Therefore, it is easy to extend algorithms tailored to Bregman divergences to alpha divergences.

Here is a code snippet in Java: RepresentationalAlphaBregman.java

Running it, you get something like

Shows that alpha divergences can be obtained from representational Bregman divergences
alpha-div=0.9008838766115398    equals Bregman rep. div=0.9008838766115399
alpha-div=0.14730849849416264   equals Bregman rep. div=0.14730849849416267
alpha-div=0.05455969651963932   equals Bregman rep. div=0.054559696519639225
alpha-div=1.2439567444853374    equals Bregman rep. div=1.2439567444853372
alpha-div=0.15345391125768915   equals Bregman rep. div=0.15345391125768917
alpha-div=0.12118392973570616   equals Bregman rep. div=0.12118392973570632
alpha-div=1.038494366079179 equals Bregman rep. div=1.0384943660791794
alpha-div=0.08541142546071197   equals Bregman rep. div=0.08541142546071195
alpha-div=0.06842729068201092   equals Bregman rep. div=0.06842729068201084
alpha-div=0.6941500904965174    equals Bregman rep. div=0.6941500904965173

In the ISVD 2009 paper, we give closed-form solutions for centroids of representational Bregman divergences, including alpha-means et beta-means.

Frank.

Dec 09, 2009

Checking the information monotonicity of the Kullback-Leibler divergence

Post @ 22:03:02 | Kullback-Leibler

I wrote a small program to illustrate the information monotonicity property of f-divergences (including Kullback-Leibler). Here is a numerical example for histograms of 8 bins that we reduce to histograms in 4 and 2 bins. The KL measure is less at coarser resolution than higher resolution.

Check the information monotonicity of Kullback-Leibler divergence:
by merging bins into a coarser histogram, the Kullback-Leibler divergence is less than the higher resolution:
0.08522624581487719 0.1320022228947157 0.13591019441965485 0.06674528382980667 0.05029196864402132 0.19946790829780184 0.20516964900877682 0.1251865270903457
0.02648151081895093 0.17500830227728026 0.2779122146863664 0.06203415984687348 0.2535969883830692 0.04424529112439828 0.1320533102395284 0.028668222623533027
                KL(p,q)=0.46400208957858724     KL(q,p)=0.4558758820302893
4 bins  KL(p,q)=0.10555546240269079     KL(q,p)=0.09733318810652078
2 bins  KL(p,q)=0.029647768170771346    KL(q,p)=0.029836929625659273
Information monotonicity holds only for Csiszar's f-divergences

Nov 26, 2009

jMEF: Exponential families library

Post @ 15:22:50 | exponential families

The jMEF library, Exponential families library, has been updated with new tutorials. A supplemental report listing formulas has been posted on arxiv: Statistical exponential families: A digest with flash cards

Several ways to solve for the geodesic equations in Rao's distance computation

Post @ 15:17:25 | Rao's distance

Rao's Distance Measure, by Colin Atkinson and Ann F. S. Mitchell, 1981 Indian Statistical Institute.

Rao's distance is the Riemannian geodesic distance induced by the Fisher information matrix as the tensor. Computing Rao's distance for given parametric distributions (let d be the number of parameters), thus involved to compute explicitly the geodesic. The distance is the length of that geodesic, the sum of its infinitesimal elements along the shortest path curve. It is thus quite complicated to compute in practice the exact Rao's distance as we need to solve the differential equation of the geodesic stated by the Euler-Lagrange equations. In this paper, three different approaches are proposed;

  • Classic Euler-Lagrange equations (d second order differential equations)
  • Hamilton's equations (2d first order differential equations)
  • Hamilton-Jacobi equations (nonlinear partial differential equation)
For uniparameter distribution, the geodesic length (Rao's distance) becomes easy, and the list of such distances are given for Poisson, binomial, exponential, chi-squared (gamma). For multi-parameter distributions, it becomes more difficult. The two classical examples are the normal distributions (which Rao geometry is hyperbolic geometry) and the multinomials (which Rao geometry is the spherical geometry). For same-mean multivariate normals, the Rao distance is given (an unpublished result due to Jensen in 1976). One problem we typically face with this distance computation is to know whether it admits a closed-form equation or not.

Frank.

Nov 24, 2009

Fisher information of Gamma distributions

Computing the Rao distance for Gamma distributions
by F. Reverter and J. M. Oller

The Gamma distribution belongs to the exponential families. Therefore, the Fisher information metric is $I(\theta)=\nabla^2 F(\theta)$. However, integrating the square root of the information matrix is difficult (no closed form solution). The author proceeds by characterizing the Riemannian geodesic using the differential equation relying on Christoffel symbols. Geodesics on the Gamma manifold are unique since the manifold is simply connected, complete with all sectional curvatures nonpositive. The authors come up with a Newton-like numerical optimization algorithm that depends on a good initialization. First, they show that the metric is bounded by Poincare metrics for which closed form equations of the geodesics are known. This yields a good starting tangent vector.
It is quite impressive to look at the formula of the closed-form equation of the Poincare geodesics. Those formula are surprisingly quite complicated.
The authors implemented their algorithm in FORTRAN and show that the algorithm always convergence on the domain examples, with high numerical precisions.

Nov 03, 2009

A library for exponential families

Post @ 21:57:18 | Exponential families

We have developed a library for manipulating exponential families in statistics:

jMEF

We can learn mixture of exponential families (such as Gaussian mixture models), mixture of Poisson, mixture of Laplacians, etc.

Oct 24, 2009

Hyperbolic Voronoi diagrams

Post @ 1:17:21 | Voronoi

Geometries are abstract by essence but we visualize them using embedding into the good old Euclidean 2D/3D spaces: our sheet of paper, or 3D browser. So consider hyperbolic geometry, it has many different realizations: conformal or not. In a conformal representation such as Poincare upper-space or ball, angles are preserved. That is, angles measured in the Euclidean geometry coincide with angles in the hyperbolic geometry. Conformal representations are therefore often wished for mapping because it tends to minimize distortions locally.

The Voronoi diagram on hyperbolic geometry has thus been studied in conformal Poincare upper-plane representation. However, geodesics are visualized by arcs of circles and it makes computation more difficult if not tricky. Another problem is that it requires more numerical precision to carry out predicate evaluations.

Now consider the Klein ball realization. In this non-conformal representation, geodesics are straight line segment (but the mid Euclidean point is not the mid hyperbolic point). Bisectors are also hyperplanes and the diagram is therefore affine, and can be computed from an equivalent power diagram. So hyperbolic Voronoi diagrams are handy and do not require more specific implementation than a weighted Voronoi diagram.

The details are explained in the following report, illustrated with an application on image browsing
PDF


KleinPoincare.png
Blue: Affine hyperbolic Voronoi diagram in Klein non-conformal ball model.
Red: Hyperbolic Voronoi diagram in Poincare conformal ball model.

Sep 22, 2009

Singular Value Decomposition: Ultimate Matrix Factorization

Post @ 18:36:13 | Matrix

I am teaching the fundamentals of 3D at Ecole Polytechnique (INF555). We are currently looking at various matrix decompositions and their use in visual computing.

To compute the PCA of high-dim datasets, we just need to compute the SVD of the covariance matrix of zero-mean normalized data sets. So I looked for a good source of explanations of SVD and I came across the lecture of Gilles Strang:
SVD lecture

Here, the 4 subspaces (image and nullspace) of column/row matrices are reviewed and it is shown how to compute the SVD by simply solving left/right eigenproblems.

Definitively worth watching! (you'll see on one example a problem with the sign in a SVD decomposition to solve!!!)

May 29, 2009

Java var args in action

Post @ 0:53:56 | Java

I am writing this library for manipulating exponential families, bregman divergences, and so on. I am using Java. Today, I discovered that Java can handle arbitrary number of parameters using the three dots ... syntax as follows:

class VarargsSum
{
    public static double cumul(double... elements)
    {
    double cumul=0.0d;
    for (double el:elements)
        cumul+=el;
    return cumul;   
    }
    
    public static void write(String... records)
    {
    for (String record: records)
      System.out.println(record);
    }
    public static void main(String [] args)
    {
    Double sum=cumul(5.0,4.0,23.0); 
    write("Computational","Information","Geometry",sum.toString());// explicit cast needed
    }   
}

Compiling and running this above code, you will get

Computational
Information
Geometry
32.0

Let me see how to best use this in the library now... Frank.

May 15, 2009

Approximating the smallest enclosing ball of balls

Post @ 7:03:35 | Applet

Approximating the smallest enclosing ball of balls
Here is an applet to play with:

applet
Frank.

May 09, 2009

Quaternions and Sir Hamilton's bridge

Post @ 23:26:37 | algebra

Hamilton's bridge and the birth of quaternions as 4D normed division algebra. It cannot exist for 3D vectors...
See also octonions and 2**n Caley constructions. There are also called hypercomplex numbers.

For the small story, here is the inscription plate:


Here as he walked by on the 16th of October 1843 Sir William Rowan Hamilton in a flash of genius discovered the fundamental formula for quaternion multiplication i2 = j2 = k2 = ijk = -1 & cut it on a stone of this bridge

How much progress has been done in a century! (eg., Lie groups and algebras)
Frank.

May 06, 2009

Learning a kernel in SVM the conformal way

Support vector machines (SVMs) are one of the key tool for classification in machine learning. Suppose you have two sets of high-dimensional points (say, +1 class and -1 class, or red and blue points if you prefer) to separate. The SVM is seeking for the unique hyperplane that separaters the +1-labelled points to the -1-labelled points that maximizes the margin: The distance to the hyperplane. The points touched by translating the hyperplane on the left and right sides are called support vectors. There are in general d+1 such points in dimension d. In practice, red/blue points cannot be linearly separated. The trick is then to use a function f to map these points in a higher dimensional feature space, where they can be linearly separated. It is always possible to do so. Now, we can manipulate these feature points implicitly with a kernel function k(x,x')=f(x).f(x'), where '.' denotes the innerproduct. This is the so-called kernel trick (geometry in a Hilbert space with a Riemannian metric). Choosing the best kernel is difficult. One way is to learn it by bootstrapping the learning machines as follows:
First, learn a SVM and detect the support vectors,
Then adjust the kernel by choosing K(x,x')=D(x)D(x')k(x,x') for a positive function D().
The idea is to enlarge the spatial resolution around the boundary separating surface.
Finally, repeat these steps as much as possible, avoiding overfitting.

All technical details are described in the paper:
S.Wu and S. Amari, Conformal Transformation of Kernel Functions: A Data-Dependent Way to Improve Support Vector Machine Classifiers, Neural Processing Letters, 15, pp. 59-67, 2002.

Frank.

Apr 30, 2009

Non-parametric distances / algorithmic distances

Post @ 19:02:47 | Boosting

There are myriads of formula describing distance between two entities. Usually they are parametric in the sense that a closed-form solution allows one to compute the distance/similarity given the two input objects.

To contrast with these hard-coded distances, there is another approach that consists in desiging non-parametric distances by solving algorithmically a problem. For example, the Earth Mover Distance (EMD) is one such famous distance solved by transport optimization technique.

DistBoost (2004) is another approach inspired from machine learning techniques. It consists in learning the distance from like/dislikes constraints. A binary classifier classes objets into similar/dissimilar classes, say 0 and 1. Moreover, the signed real-valued margin that indicates a confidence level on the prediction can be purposely interpreted as a similarity measure. DistBoost is inspired from the renown boosting technique. The weak learners are solved using a sophisticated constrained EM algorithm.

More details by reading the paper:

Boosting Margin Based Distance Functions for Clustering

The future of data analysis is likely to incorporate these algorithmic distances.
Frank.

Apr 07, 2009

Testing normality of multivariate data

Post @ 23:17:50 | Statistics

K-means is certainly the most useful algorithm for clustering datasets. However, we need to give a prescribed number of clusters. One wat to circumvent this is to use penalty criteria (like AIC or BIC) or use the MDL principle. A simpler solution is to run the Anderson-Darling 1D test on projected data as proned by Hamerly and Elkan (NIPS*2003) Learning the k in k -means

However the test is not fully dimensional. Another interesting approach is based on the minimum spanning tree of the source dataset and a pooled sample (with parameters estimated from the sample mean and sample variance covariance):

A Test to Determine the Multivariate Normality of a Data Set (PAMI 1988).

The null hypothesis testing algorithm runs in quadratic time. It is suprising to see that the paper has not been mentioned more in the literature (closer works with MST and entropy are those of A. Hero). If you give it a try, let me know -:)
Frank.

Apr 04, 2009

Representable divergence: Csiszar and Bregman

Post @ 2:14:34 | Bregman-Csiszar

Csiszar C_f f-divergences preserve information monotonicity, Bregman divergences B_F are canonical divergences of dually flat spaces. These two families intersect only at the Kullback-Leibler divergence.

Consider now using a parameter representation function, say k (strictly monotone). And define B_{F,k}(p||q)=B_F(k(p)||k(q)) then you can obtain alpha and beta-divergences using such an extended Bregman divergence. Furthermore, add an external divergence representation function so that C_{h,f}(p||q)=h(C_f(p||q)). Then you get Renyi, Sharma Mittal and Bhattacharyya divergences using external representations of f-divergences.
Convexity and monotonicity are two puzzling ingredients for function rewriting.

I came across the book: Pardo L: Statistical Inference Based on Divergence Measures. Chapman&Hall, London, 2006.
It is very nice to have a monography focusing on statistical inference wrt. f-divergences.

IG_StatInfBook.jpg

Frank.

Mar 27, 2009

Batched and Incremental k-means

Post @ 17:57:16 | k-means

Since Lloyd's k-means iterative algorithm for hard clustering that stepwisely assign points to their nearest cluster and relocate these centers as centroids, there have been many generalizations, including the Bregman k-means.
This k-means is a batched k-means as points are assigned to their nearest center in a single stage. The optimization is monotone but only converge to a local minimum. To get off this potential local minimum, we can use a single swap that tries to move a point from a cluster to another one. Of course, we better choose the point and target cluster as the one that best decreases the loss function.
This incremental k-means is known and explained as the first variation by Teboulle et al.:

Clustering with Entropy-Like k-Means Algorithms 

paper

I invite you to look at the minimization of the loss function on a gene array expression dataset by combining batched and online k-means.
Frank.

Mar 19, 2009

Out of core nearest neighbor: Fast but good ones!

Post @ 22:04:26 | nearest neighbor

One colleague of mine showed me the recently published paper

NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections (TPAMI)


It is yet another simple tree-based technique (different from spill trees) that involve line projections. Experimental results are reported.

Mar 14, 2009

Dually flat spaces and canonical divergences

Well most of us are familiar with Bregman divergences allows one to extend many familiar algorithms such as the celebrated k-means clustering algorithm. Bregman divergences are known to be the canonical divergences of dually flat spaces. Euclidean geometry is simply self-dually flat (induced by the Legendre self-dual paraboloid function).

However if we consider separable divergence defined by means of a monotonous function called the representation funcrton acting on coordinate systems and a convex function, then we can define generalized canonical divergences as explained in Jun Zhang's seminal paper:
Zhang, J. 2004. Divergence function, duality, and convex analysis. Neural Comput. 16, 1 (Jan. 2004), 159-195. DOI= http://dx.doi.org/10.1162/08997660460734047

<

p> Now Eguchi and Copas' beta divergences (KL for beta=0) also induce a dually flat structure both on the manifold of positive arrays (and also the submanifold of probability vectors).
S. Eguchi and J. Copas (2002): A class of logistic-type discriminant functions, Biometrika, 89, 1-22. (

So let us keep in mind that dually flat spaces is a notion that encompasses Bregman divergences... A philosophical egg and chicken question of the link between geometry vs distances.

Mar 11, 2009

Circular Earth Mover Distance

Post @ 0:34:31 | EMD

Matching feature descriptors in vision is essential for stitching and object recognition among others. Since SIFT is based on discretizing the 360-degree wheel of gradient at different scales, it is better to use circular earth mover distance than a straight EMD.


Experiments are reported Circular Earth Mover?s Distance for the comparison of local features

Mar 02, 2009

Exponential families as Universal density estimators

Post @ 18:32:40 | Exponential families

It is well-known that any smooth density can be well approximated using a mixture of Gaussians. Gaussian distributions belong to the family of exponential families in statistics. There is even a more powerful property of exponential families with so-called rich sufficient statistics.

They take advantage of RKHS (Reproducing Kernel Hilbert Space). So forget the mixture and just consider one exponential family for modeling non-parametric distribution.

Details are in: Exponential families for conditional random fields (2004)

Feb 27, 2009

A unique characterization of alpha-divergences

Post @ 15:11:45 | alpha-divergence

Professor Shun-ichi Amari recently gave a talk at Ecole Polytechnique on information geometry. The latter point of his talk characterizes interestingly the alpha-divergences on positive measures (non-normalized distributions) as the intersection of f-divergences and Bregman divergences.


http://videolectures.net/etvc08_paris/

Of course, on probability measures, only the Kullback-Leibler divergence lies in the common intersection (and its dual).

Feb 26, 2009

Information monotonicity

Post @ 18:10:05 | f-divergence

Csiszar f-divergences have the property of information monotonicity. So take a positive array p with n bins and partition it into m bins P with the probability of falling in a bin being the sum of the probability of the atoms of that bin. Then the f-divergence of (p,q) should always be greater or equal to the f-divergence of the corresponding partitionned arrays. In other words:

f-div(p,q) >= f-div(P,Q)

That means that we can only loose information by aggregating atoms. That is one fairly reasonnable behavior of information.

More axiomatic characterization is detailed in:

Csiszár, Imre. 2008. "Axiomatic Characterizations of Information Measures." Entropy 10, no. 3: 261-273.


Frank.

Previous Logs