Hadoop & Spark
Apache Spark is a lightning-fast cluster computing designed for fast computation. It was built on top of Hadoop MapReduce and it extends the MapReduce model to efficiently use more types of computations which includes Interactive Queries and Stream Processing.
Apache Spark is often deployed in conjunction with a Hadoop cluster, and Spark is able to benefit from a number of capabilities of Hadoop. Spark is a powerful tool for processing large volumes of data.
But, on its own, Spark is not yet well-suited to production workloads in the enterprise. Integration with Hadoop gives Spark many of the capabilities
Hadoop since its early versions which were essentially concerned with facilitating the batch processing of MapReduce jobs on large volumes of data stored
in HDFS. Particularly since the introduction of the YARN resource manager, Hadoop is now better able to manage a wide range of data processing tasks,
from batch processing to streaming data and graph analysis.
- Understanding Big Data and Hadoop
- Map Reduce
- Spark Core
- Programming in Scala and RDD