Buch, Englisch, 85 Seiten, Paperback, Format (B × H): 187 mm x 235 mm
A Practical Perspective
Buch, Englisch, 85 Seiten, Paperback, Format (B × H): 187 mm x 235 mm
Reihe: Synthesis Lectures on Data Management
ISBN: 978-1-60845-677-2
Verlag: Morgan & Claypool Publishers
In this book, we first discuss the goals of data cleaning. Often, the goals of data cleaning are not well defined and could mean different solutions in different scenarios. Toward clarifying these goals, we abstract out a common set of data cleaning tasks that often need to be addressed. This abstraction allows us to develop solutions for these common data cleaning tasks. We then discuss a few popular approaches for developing such solutions. In particular, we focus on an operator-centric approach for developing a data cleaning platform. The operator-centric approach involves the development of customizable operators that could be used as building blocks for developing common solutions. This is similar to the approach of relational algebra for query processing. The basic set of operators can be put together to build complex queries. Finally, we discuss the development of custom scripts which leverage the basic data cleaning operators along with relational operators to implement effective solutions for data cleaning tasks.
Autoren/Hrsg.
Weitere Infos & Material
- Preface
- Acknowledgments
- Introduction
- Technological Approaches
- Similarity Functions
- Operator: Similarity Join
- Operator: Clustering
- Operator: Parsing
- Task: Record Matching
- Task: Deduplication
- Data Cleaning Scripts
- Conclusion
- Bibliography
- Authors' Biographies