Support vector machines (SVMs) are one of the key tool for classification in machine learning.
Suppose you have two sets of high-dimensional points (say, +1 class and -1 class, or red and blue points if you prefer) to separate.
The SVM is seeking for the unique hyperplane that separaters the +1-labelled points to the -1-labelled points that maximizes the margin: The distance to the hyperplane.
The points touched by translating the hyperplane on the left and right sides are called support vectors.
There are in general d+1 such points in dimension d.
In practice, red/blue points cannot be linearly separated.
The trick is then to use a function f to map these points in a higher dimensional feature space, where they can be linearly separated.
It is always possible to do so. Now, we can manipulate these feature points implicitly with a kernel function k(x,x')=f(x).f(x'), where '.' denotes the innerproduct.
This is the so-called kernel trick (geometry in a Hilbert space with a Riemannian metric).
Choosing the best kernel is difficult. One way is to learn it by bootstrapping the learning machines as follows:
First, learn a SVM and detect the support vectors,
Then adjust the kernel by choosing K(x,x')=D(x)D(x')k(x,x') for a positive function D().
The idea is to enlarge the spatial resolution around the boundary separating surface.
Finally, repeat these steps as much as possible, avoiding overfitting.
All technical details are described in the paper:
S.Wu and S. Amari, Conformal Transformation of Kernel Functions: A Data-Dependent
Way to Improve Support Vector Machine Classifiers, Neural Processing Letters, 15, pp.
59-67, 2002.
Support vector machines (SVMs) are one of the key tool for classification in machine learning. Suppose you have two sets of high-dimensional points (say, +1 class and -1 class, or red and blue points if you prefer) to separate. The SVM is seeking for the unique hyperplane that separaters the +1-labelled points to the -1-labelled points that maximizes the margin: The distance to the hyperplane. The points touched by translating the hyperplane on the left and right sides are called support vectors. There are in general d+1 such points in dimension d. In practice, red/blue points cannot be linearly separated. The trick is then to use a function f to map these points in a higher dimensional feature space, where they can be linearly separated. It is always possible to do so. Now, we can manipulate these feature points implicitly with a kernel function k(x,x')=f(x).f(x'), where '.' denotes the innerproduct. This is the so-called kernel trick (geometry in a Hilbert space with a Riemannian metric). Choosing the best kernel is difficult. One way is to learn it by bootstrapping the learning machines as follows:
First, learn a SVM and detect the support vectors,
Then adjust the kernel by choosing K(x,x')=D(x)D(x')k(x,x') for a positive function D().
The idea is to enlarge the spatial resolution around the boundary separating surface.
Finally, repeat these steps as much as possible, avoiding overfitting.
All technical details are described in the paper:
S.Wu and S. Amari, Conformal Transformation of Kernel Functions: A Data-Dependent Way to Improve Support Vector Machine Classifiers, Neural Processing Letters, 15, pp. 59-67, 2002.
Frank.