Buch, Englisch, 140 Seiten, Paperback, Format (B × H): 187 mm x 235 mm
Buch, Englisch, 140 Seiten, Paperback, Format (B × H): 187 mm x 235 mm
Reihe: Synthesis Lectures on Computer Vision
ISBN: 978-1-60845-133-3
Verlag: Morgan & Claypool Publishers
We then present two multiple instance learning schemes for face detection, multiple instance learning boosting (MILBoost) and winner-take-all multiple category boosting (WTA-McBoost). MILBoost addresses the uncertainty in accurately pinpointing the location of the object being detected, while WTA-McBoost addresses the uncertainty in determining the most appropriate subcategory label for multiview object detection. Both schemes can resolve the ambiguity of the labeling process and reduce outliers during training, which leads to improved detector performances.
In many applications, a detector trained with generic data sets may not perform optimally in a new environment. We propose detection adaption, which is a promising solution for this problem. We present an adaptation scheme based on the Taylor expansion of the boosting learning objective function, and we propose to store the second order statistics of the generic training data for future adaptation. We show that with a small amount of labeled data in the new environment, the detector's performance can be greatly improved.
We also present two interesting applications where boosting learning was applied successfully. The first application is face verification for filtering and ranking image/video search results on celebrities. We present boosted multi-task learning (MTL), yet another boosting learning algorithm that extends MILBoost with a graphical model. Since the available number of training images for each celebrity may be limited, learning individual classifiers for each person may cause overfitting. MTL jointly learns classifiers for multiple people by sharing a few boosting classifiers in order to avoid overfitting. The second application addresses the need of speaker detection in conference rooms. The goal is to find who is speaking, given a microphone array and a panoramic video of the room. We show that by combining audio and visual features in a boosting framework, we can determine the speaker's position very accurately. Finally, we offer our thoughts on future directions for face detection.
Autoren/Hrsg.
Weitere Infos & Material
- A Brief Survey of the Face Detection Literature
- Cascade-based Real-Time Face Detection
- Multiple Instance Learning for Face Detection
- Detector Adaptation
- Other Applications
- Conclusions and Future Work