Pro Hortonworks Data Platform

Harness the Power and Promise of Big Data with HDP
1. Auflage 2015
ISBN: 978-1-4842-0669-0
Verlag: Apress

Buch, Englisch, 400 Seiten, Book, Format (B × H): 178 mm x 254 mm

Harness the Power and Promise of Big Data with HDP

Buch, Englisch, 400 Seiten, Book, Format (B × H): 178 mm x 254 mm

ISBN: 978-1-4842-0669-0
Verlag: Apress

36,33 €

(inkl. MwSt.)

versandkostenfreie Lieferung
Lieferfrist: bis zu 10 Tage

Bücher versandkostenfrei

kostenlose Rücksendung

Companies are finding new sources of valuable data—from social media to clickstreams to server logs to machine and geolocation data—and they know they need to employ big data tools, like Hadoop, to make the best use of it. Many organizations are turning to Hortonworks—started by twenty-four of the original team of Yahoo! engineers that developed Hadoop—a company that has emerged as one of the key vendors helping enterprise customers make use of Hadoop to gain new, powerful insights into customer needs and wants.

The company’s flagship product, Hortonworks Data Platform (HDP), is more than a product. It is a platform and a suite of tools that together create a framework for loading, managing, accessing and analyzing massive volumes of data no matter the format or schema. Pro Hortonworks Data Platform: Harness the Power and Promise of Big Data with HDP, written by Hadoop and HDP expert Stephen Giles, is designed to help readers gain full advantage from HDP. The book—which assumes no prior knowledge of Hadoop—provides an understanding of all facets of HDP and how the various parts work both together and within a larger data platform. It is the insightful "missing manual" that all HDP users need to understand the platform in depth and how to use it to best advantage.

Pro Hortonworks Data Platform provides a deep understanding of the specific components that make HDP so powerful. The book will:
- Show how to install, configure, and secure HDP and all its components
- Illustrate the full lifecycle of a big data project using HDP
Provide a deep understanding of Yarn, the core engine of Hadoop
- Show how to leverage HDP/Apache tools like Pig, Hive, Hbase, and Solr to harness data
Pro Hortonworks Data Platform provides insight and hands-on examples of how to work with each tool within the Hortonworks framework. Developers and IT pros will be able to get an understanding of Hadoop and its supporting tools, as well as a clear sense of where and when to take advantage of its power. This book will not only show you how to process data effectively—it will show you how to take advantage of the business opportunities that lie within that data.

Giles / Khan Pro Hortonworks Data Platform jetzt bestellen!

Zielgruppe

Popular/general

Autoren/Hrsg.

Giles, Stephen

Khan, Umair

Fachgebiete

Weitere Infos & Material

Inhaltsverzeichnis

Chapter 1: Introduction to Hortonworks Data Platform (HDP)

Chapter Goal: This chapter will set the stage for the rest of the book. It will discuss Hadoop and Big Data at a high level for those not familiar with this concepts. It will be the only general knowledge chapter in the book. This secondary purpose is to give the big picture of all the parts of the Hortonworks HDP ecosystem and put those parts in context.
- A brief history of Hadoop
- Brief overview of the big data landscape and where Hadoop fits in
- Top level overview of the Hortonworks Data Platform and Enterprise Hadoop

Chapter 2: Understanding HDFS

Chapter Goal: HDFS is a distributed storage system that form Hadoop. This chapter will define the base principles of Hadoop in HDP and how to work with MapReduce.
- Understanding HDFS architecture
- Understanding how data is stored in HDFS
- Understanding the relationship between NameNodes and DataNotes
- Working with WebHDFS and Hadoop fs commands

Chapter 3: Understanding YARN

Chapter Goal: YARN is the "operating system" of HDP. YARN allows both batch and real time access to data. This chapter will provide a deep understanding of Yarn and how it is employed in HDP.
- Description of the architecture of Yarn and its relationship to HDFS
- Understanding the components of Yarn (ResourceManager, NodeManager, ApplicationMasters and Containers) as configured in HDP
- Understanding MapReduce and how MapReduce jobs are executed under YARN

Chapter 4: Getting at Your Data

Chapter Goal: HDP has a number of tools to query and explore your data without needing to write complex MapReduce jobs. This chapter will look at the key tools for accessing data in HDP.

Chapter 5: Bringing NoSQL to Hadoop in HDP

Chapter Goal: This chapter builds on chapter 4 and discusses how some No SQL tools, built on top of YARN in HDP, can provide greater access to data.
- Understanding and working with Hbase
- Understanding and working with Accumulo

Chapter 6: Working with HDP in Real Time

Chapter Goal: Traditional Hadoop was a batch-based process. YARN introduced the ability to add real time or near real time access to your data. This chapter will look at how developers can use Storm in HDP to process streaming data into their data applications.
- Working with Storm
- Understanding the Trident API
- Combining Storm with HDFS for data
- Use cases for streaming data

Chapter 7: Installing and Configuring HDP

Chapter Goal: The next three chapters will pivot from the developer side of Hadoop to the administration of Hadoop within HDP. This chapter will walk through the process of Installing and configuring Hadoop.
- Installing Hortonworks HDP
- Configuring HDP
- HDP deployments in Windows, Linux, and private clouds

Chapter 8: Securing HDP

Chapter Goal: Security and governance is one of the biggest concerns of all administrators. HDP provides particular security assurances that will help admins sleep better at night. This chapter will show how to secure Hadoop within HDP and how to integrate Hadoop into common directory services.
- Understanding Hadoop security concepts
- Setting up authentication and authorization in HDP
- Auditing security access
- Linking to other directory services
- Securing a cluster with Knox

Chapter 9: Monitoring and Managing Data in HDP

Chapter Goal: This chapter will explain how to monitor and manage a Hadoop cluster once it has been created in HDP.
- Monitoring and management approaches
- Scheduling jobs with Oozie
- Deploying and managing Hadoop with Ambari
- Working with Zookeeper

Chapter 10: Getting Your Data into HDP

Chapter Goal: Once you have configured your Hadoop instance, the next step is to get data into the cluster. This chapter will look at a number of tools for providing ETL (Extract, Transform, Load) process to load data into HDP for Hadoop processing.
- Executing bulk transfers of data into and out of Hadoop using Sqoop
- Managing data processing and governance with Falcon
- Loading high volume streaming data into HDF using Flume

Chapter 11: Understanding HDP Architectural Patterns

Chapter Goal: This chapter will look at some common architectural patterns for working effectively with HDP.
- Working with Lambda architecture
- Thinking of data lakes

Chapter 12: Incorporating HDP into Your Larger Data Infrastructure

Chapter Goal: This chapter will look at how HDP can be incorporated into a larger data platform. It will place Hadoop with the context of BI solutions, data warehouses, and other MPP appliances (like Terradata and Netezza).
- Integrating HDP with enterprise data warehouses, RDBMS, and MPP systems
- Connecting BI tools to Hadoop
- Integrating HDP with its ecosystem of analytics partners

Chapter 13: Adding Advanced Search in HDP with Solr

Chapter Goal: This chapter will examine some advanced data access features in HDP, primarily Solr.
- Leveraging Apache Solr in HDP
- Full text indexing with Solr
- Searching Hadoop Data with Apache Solr

Chapter 14: Bringing HDP into the Cloud

Chapter Goal: This final chapter will look forward to helping build Hadoop solutions in the cloud. This chapter will look both at HDInsight on Microsoft Azure and Hadoop on Amazon’s AWS platform.
- Hadoop on Azure and HDInsight
- Limitations of Hadoop with HDInsight
- Running HDP on AWS

Appendix: HDP Add Ons
- Covers Spark, Advanced Security, ODBC Driver, Teradata Connector, SCOM Management, Oracle Quest Data Connector

Über Autor(innen)

Stephen Giles is a Principal Consultant at T4G Limited. T4G has been providing innovative IT solutions to businesses across North America since 1996. It is also a key Hortonworks partner in Canada. Stephen recently joined T4G after six years at Microsoft. He has been a trainer, consultant, author, and IT director. His first love is data, and he has worked with every version of SQL Server and sees Hadoop as the next evolution in the data story. When not consulting and writing, Stephen lives with his family in Toronto, Ontario, and he steals as much time has he can to play his guitar and bass and jam with his family.

Produktsicherheit

Fragen zum Artikel?

Ihre Fragen, Wünsche oder Anmerkungen

Vorname*

Nachname*

Ihre E-Mail-Adresse*

Kundennr.

Ihre Nachricht*

Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.

Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.

36,33 € (inkl. MwSt.)

Lieferfrist: bis zu 10 Tage

Bücher versandkostenfrei

kostenlose Rücksendung

Webcode: sack.de/jj4kk