Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 (Addison-Wesley Data & Analytics)
Arun Murthy, Vinod Vavilapalli
Format: PDF / Kindle (mobi) / ePub
“This book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm.”
—From the Foreword by Raymie Stata, CEO of Altiscale
The Insider’s Guide to Building Distributed, Big Data Applications with Apache Hadoop™ YARN
Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop™ YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances.
YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment.
You’ll find many examples drawn from the authors’ cutting-edge experience—first as Hadoop’s earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward and helping customers succeed with it.
YARN’s goals, design, architecture, and components—how it expands the Apache Hadoop ecosystem
Exploring YARN on a single node
Administering YARN clusters and Capacity Scheduler
Running existing MapReduce applications
Developing a large-scale clustered YARN application
Discovering new open source frameworks that run under YARN
YARN provides a basic infrastructure for monitoring and life-cycle management of containers, while application-specific semantics are managed independently by each framework. This design is in sharp contrast to the original Hadoop version 1 design, in which scheduling was designed and integrated around managing only MapReduce tasks. Figure 4.1 illustrates the relationship between the application and YARN components. The YARN components appear as the large outer boxes (ResourceManager and
value if it ŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠdoesn't exist and overwrites it if it does. ŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠexist. A 'delete' operation removes the property ŠŠŠ-f, --fileŠŠŠŠŠŠŠŠŠŠŠŠThe name of the configuration file. ŠŠŠ-p, --propertyŠŠŠŠŠŠŠŠThe name of the Hadoop configuration property. ŠŠŠ-v, --valueŠŠŠŠŠŠŠŠŠŠŠThe value of the Hadoop configuration property. ŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠRequired for a 'put' operation; ignored for a ŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠ'delete' operation. ŠŠŠ-r,
queues. Access Control Lists in Hadoop are configured by specifying the list of users and/or groups as a string property. For specifying a list of users and groups, the format is 舠user1,user2 group1,group舡 (a comma-separated list of users, followed by a space separator, followed by a comma-separated list of groups). If it is set to 舠*舡 (asterisk), all users and groups are allowed to perform the operation guarded by the ACL in question. If it is set to 舠 舡 (i.e., space), no users or groups are
using MapReduce APIs are source compatible and can run on YARN either with no changes, with simple recompilation against MRv2 jar files that are shipped with Hadoop version 2, or with minor updates. Compatibility of Command-line Scripts Most of the command-line scripts from Hadoop 1.x should just work, without any tweaking. The only exception is MRAdmin, whose functionality was removed from MRv2 because JobTracker and TaskTracker no longer exist. The MRAdmin functionality has been replaced with
version 2. Support for Hadoop 0.23 and 2.x is available starting with Oozie release 3.2.0. Existing Oozie workflows can start taking advantage of YARN in versions 0.23 and 2.x with Oozie 3.2.0 and above. Advanced Features The following features are included in Hadoop version 2 but have not been extensively tested. The user community is encouraged to play with these features and provide feedback to the Apache Hadoop community. Uber Jobs An Uber Job occurs when multiple mapper and reducers are