Buch, Englisch, 400 Seiten, Book, Format (B × H): 178 mm x 254 mm
Harness the Power and Promise of Big Data with HDP
Buch, Englisch, 400 Seiten, Book, Format (B × H): 178 mm x 254 mm
ISBN: 978-1-4842-0669-0
Verlag: Apress
Companies are finding new sources of valuable data—from social media to clickstreams to server logs to machine and geolocation data—and they know they need to employ big data tools, like Hadoop, to make the best use of it. Many organizations are turning to Hortonworks—started by twenty-four of the original team of Yahoo! engineers that developed Hadoop—a company that has emerged as one of the key vendors helping enterprise customers make use of Hadoop to gain new, powerful insights into customer needs and wants.
The company’s flagship product, Hortonworks Data Platform (HDP), is more than a product. It is a platform and a suite of tools that together create a framework for loading, managing, accessing and analyzing massive volumes of data no matter the format or schema. Pro Hortonworks Data Platform: Harness the Power and Promise of Big Data with HDP, written by Hadoop and HDP expert Stephen Giles, is designed to help readers gain full advantage from HDP. The book—which assumes no prior knowledge of Hadoop—provides an understanding of all facets of HDP and how the various parts work both together and within a larger data platform. It is the insightful "missing manual" that all HDP users need to understand the platform in depth and how to use it to best advantage.
Pro Hortonworks Data Platform provides a deep understanding of the specific components that make HDP so powerful. The book will:
- Show how to install, configure, and secure HDP and all its components
- Illustrate the full lifecycle of a big data project using HDP
Provide a deep understanding of Yarn, the core engine of Hadoop
- Show how to leverage HDP/Apache tools like Pig, Hive, Hbase, and Solr to harness data
Pro Hortonworks Data Platform provides insight and hands-on examples of how to work with each tool within the Hortonworks framework. Developers and IT pros will be able to get an understanding of Hadoop and its supporting tools, as well as a clear sense of where and when to take advantage of its power. This book will not only show you how to process data effectively—it will show you how to take advantage of the business opportunities that lie within that data.
Zielgruppe
Popular/general
Autoren/Hrsg.
Fachgebiete
Weitere Infos & Material
Chapter 1: Introduction to Hortonworks Data Platform (HDP)
Chapter Goal: This chapter will set the stage for the rest of the book. It will discuss Hadoop and Big Data at a high level for those not familiar with this concepts. It will be the only general knowledge chapter in the book. This secondary purpose is to give the big picture of all the parts of the Hortonworks HDP ecosystem and put those parts in context.
- A brief history of Hadoop
- Brief overview of the big data landscape and where Hadoop fits in
- Top level overview of the Hortonworks Data Platform and Enterprise Hadoop
Chapter 2: Understanding HDFS
Chapter Goal: HDFS is a distributed storage system that form Hadoop. This chapter will define the base principles of Hadoop in HDP and how to work with MapReduce.
- Understanding HDFS architecture
- Understanding how data is stored in HDFS
- Understanding the relationship between NameNodes and DataNotes
- Working with WebHDFS and Hadoop fs commands
Chapter 3: Understanding YARN
Chapter Goal: YARN is the "operating system" of HDP. YARN allows both batch and real time access to data. This chapter will provide a deep understanding of Yarn and how it is employed in HDP.
- Description of the architecture of Yarn and its relationship to HDFS
- Understanding the components of Yarn (ResourceManager, NodeManager, ApplicationMasters and Containers) as configured in HDP
- Understanding MapReduce and how MapReduce jobs are executed under YARN
Chapter 4: Getting at Your Data
Chapter Goal: HDP has a number of tools to query and explore your data without needing to write complex MapReduce jobs. This chapter will look at the key tools for accessing data in HDP.
Chapter 5: Bringing NoSQL to Hadoop in HDP
Chapter Goal: This chapter builds on chapter 4 and discusses how some No SQL tools, built on top of YARN in HDP, can provide greater access to data.
- Understanding and working with Hbase
- Understanding and working with Accumulo
Chapter 6: Working with HDP in Real Time
Chapter Goal: Traditional Hadoop was a batch-based process. YARN introduced the ability to add real time or near real time access to your data. This chapter will look at how developers can use Storm in HDP to process streaming data into their data applications.
- Working with Storm
- Understanding the Trident API
- Combining Storm with HDFS for data
- Use cases for streaming data
Chapter 7: Installing and Configuring HDP
Chapter Goal: The next three chapters will pivot from the developer side of Hadoop to the administration of Hadoop within HDP. This chapter will walk through the process of Installing and configuring Hadoop.
- Installing Hortonworks HDP
- Configuring HDP
- HDP deployments in Windows, Linux, and private clouds
Chapter 8: Securing HDP
Chapter Goal: Security and governance is one of the biggest concerns of all administrators. HDP provides particular security assurances that will help admins sleep better at night. This chapter will show how to secure Hadoop within HDP and how to integrate Hadoop into common directory services.
- Understanding Hadoop security concepts
- Setting up authentication and authorization in HDP
- Auditing security access
- Linking to other directory services
- Securing a cluster with Knox
Chapter 9: Monitoring and Managing Data in HDP
Chapter Goal: This chapter will explain how to monitor and manage a Hadoop cluster once it has been created in HDP.
- Monitoring and management approaches
- Scheduling jobs with Oozie
- Deploying and managing Hadoop with Ambari
- Working with Zookeeper
Chapter 10: Getting Your Data into HDP
Chapter Goal: Once you have configured your Hadoop instance, the next step is to get data into the cluster. This chapter will look at a number of tools for providing ETL (Extract, Transform, Load) process to load data into HDP for Hadoop processing.
- Executing bulk transfers of data into and out of Hadoop using Sqoop
- Managing data processing and governance with Falcon
- Loading high volume streaming data into HDF using Flume
Chapter 11: Understanding HDP Architectural Patterns
Chapter Goal: This chapter will look at some common architectural patterns for working effectively with HDP.
- Working with Lambda architecture
- Thinking of data lakes
Chapter 12: Incorporating HDP into Your Larger Data Infrastructure
Chapter Goal: This chapter will look at how HDP can be incorporated into a larger data platform. It will place Hadoop with the context of BI solutions, data warehouses, and other MPP appliances (like Terradata and Netezza).
- Integrating HDP with enterprise data warehouses, RDBMS, and MPP systems
- Connecting BI tools to Hadoop
- Integrating HDP with its ecosystem of analytics partners
Chapter 13: Adding Advanced Search in HDP with Solr
Chapter Goal: This chapter will examine some advanced data access features in HDP, primarily Solr.
- Leveraging Apache Solr in HDP
- Full text indexing with Solr
- Searching Hadoop Data with Apache Solr
Chapter 14: Bringing HDP into the Cloud
Chapter Goal: This final chapter will look forward to helping build Hadoop solutions in the cloud. This chapter will look both at HDInsight on Microsoft Azure and Hadoop on Amazon’s AWS platform.
- Hadoop on Azure and HDInsight
- Limitations of Hadoop with HDInsight
- Running HDP on AWS
Appendix: HDP Add Ons
- Covers Spark, Advanced Security, ODBC Driver, Teradata Connector, SCOM Management, Oracle Quest Data Connector