Big Data Hadoop and Spark Developer in Amazon Elastic MapReduce
The Big Data Hadoop and Spark Developer course has been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies focused on AWS and Amazon EMR.
Overview
This course includes:
- 7+ hours of on-demand video
- 4 modules
- Core level
- Direct access/chat with the instructor
- 100% self-paced online
- Many downloadable resources
- Shareable certificate of completion
Skills You Will Gain
Learning Outcomes (At The End Of This Program, You Will Be Able To...)
- Describe the different main Hadoop ecosystems such as Hadoop 3, Yarn, Pig, and Hive.
- Explain the functionality and architecture of the Hadoop Distributed File System (HDFS) and YARN resource management.
- Explain how Map Reduce works and how it is implemented in the Hadoop environment.
- Explain what the file formats used in BIg Data – Avro, Parquet, and Orc; and how to use them in the EMR environment.
- Explain the difference between traditional RDBMS and Hive tables.
- Explain the architecture and functionality of Spark
- Use resilient distribution datasets (RDD) for data processing in Spark.
- Implement and build Spark applications.
- Write a basic functional code in Scala to run a Spark application.
- Explain parallel processing in Spark and Spark optimization using Catalyst and Tungsten.
- Use Spark SQL, creating, transforming, and querying Data frames.
- Explain the differences and use cases for Spark RDDs, DataFrames, and DataSets.
- Create and deploy AWS EC2 instances, EBS, and S3 storage volumes.
- Explain the pricing models for AWS storage and compute resources.
- Configure and deploy an EMR Cluster in AWS.
- Run Spark and Map Reduce applications in batch and interactive mode on EMR.
- Create and use EMR notebooks.
- Explain the differences between IaaS and PaaS resources in AWS
- Create and manage a personal AWS account
Prerequisites
- A basic level of conceptual understanding of data warehouses is assumed, as well as an awareness of the core functions of SQL.
Who Should Attend
- Data Analysts
- Data Engineers
- Data Scientists
- Database Architects
- Database Administrators