E-Book, Englisch, 448 Seiten, eBook
Principe Information Theoretic Learning
1. Auflage 2010
ISBN: 978-1-4419-1570-2
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
Renyi's Entropy and Kernel Perspectives
E-Book, Englisch, 448 Seiten, eBook
Reihe: Information Science and Statistics
ISBN: 978-1-4419-1570-2
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
This book is the first cohesive treatment of ITL algorithms to adapt linear or nonlinear learning machines both in supervised and unsupervised paradigms. It compares the performance of ITL algorithms with the second order counterparts in many applications.
José C. Principe is Distinguished Professor of Electrical and Biomedical Engineering, and BellSouth Professor at the University of Florida, and the Founder and Director of the Computational NeuroEngineering Laboratory. He is an IEEE and AIMBE Fellow, Past President of the International Neural Network Society, Past Editor-in-Chief of the IEEE Trans. on Biomedical Engineering and the Founder Editor-in-Chief of the IEEE Reviews on Biomedical Engineering. He has written an interactive electronic book on Neural Networks, a book on Brain Machine Interface Engineering and more recently a book on Kernel Adaptive Filtering, and was awarded the 2011 IEEE Neural Network Pioneer Award.
Zielgruppe
Research
Autoren/Hrsg.
Weitere Infos & Material
Information Theory, Machine Learning, and Reproducing Kernel Hilbert Spaces.- Renyi’s Entropy, Divergence and Their Nonparametric Estimators.- Adaptive Information Filtering with Error Entropy and Error Correntropy Criteria.- Algorithms for Entropy and Correntropy Adaptation with Applications to Linear Systems.- Nonlinear Adaptive Filtering with MEE, MCC, and Applications.- Classification with EEC, Divergence Measures, and Error Bounds.- Clustering with ITL Principles.- Self-Organizing ITL Principles for Unsupervised Learning.- A Reproducing Kernel Hilbert Space Framework for ITL.- Correntropy for Random Variables: Properties and Applications in Statistical Inference.- Correntropy for Random Processes: Properties and Applications in Signal Processing.
"9 A Reproducing Kernel Hilbert Space Framework for ITL (p. 351-352)
9.1 Introduction
During the last decade, research on Mercer kernel-based learning algorithms has ?ourished [226,289,294]. These algorithms include, for example, the support vector machine (SVM) [63], kernel principal component analysis (KPCA) [289], and kernel Fisher discriminant analysis (KFDA) [219]. The common property of these methods is that they operate linearly, as they are explicitly expressed in terms of inner products in a transformed data space that is a reproducing kernel Hilbert space (RKHS).
Most often they correspond to nonlinear operators in the data space, and they are still relatively easy to compute using the so-called “kernel-trick”. The kernel trick is no trick at all; it refers to a property of the RKHS that enables the computation of inner products in a potentially in?nite-dimensional feature space, by a simple kernel evaluation in the input space. As we may expect, this is a computational saving step that is one of the big appeals of RKHS.
At ?rst glance one may even think that it defeats the “no free lunch theorem” (get something for nothing), but the fact of the matter is that the price of RKHS is the need for regularization and in the memory requirements as they are memory-intensive methods. Kernel-based methods (sometimes also called Mercer kernel methods) have been applied successfully in several applications, such as pattern and object recognition [194], time series prediction [225], and DNA and protein analysis [350], to name just a few.
Kernel-based methods rely on the assumption that projection to the highdimensional feature space simpli?es data handling as suggested by Cover’s theorem, who showed that the probability of shattering data (i.e., separating it exactly by a hyperplane) approaches one with a linear increase in space dimension [64].
In the case of the SVM, the assumption is that data classes become linearly separable, and therefore a separating hyperplane is su?cient for perfect classi?cation. In practice, one cannot know for sure if this assumption holds. In fact, one has to hope that the user chooses a kernel (and its free parameter) that shatters the data, and because this is improbable, the need to include the slack variable arises. The innovation of SVMs is exactly on how to train the classi?ers with the principle of structural risk minimization [323]."