starweaver-logo
LOG INGET STARTED
LOG INGET STARTED
  • Browse
  • Doing

  • On Air
  • Channels
  • Career Paths
  • LEARNING

  • Courses
  • Certifications
  • Journeys
  • Test Prep
  • CONNECTING

  • How It Works
  • Community
  • Techbytes
  • Podcasts
  • Leaderboards
  • SUPPORT

  • Support & FAQs
  • Starweaver for Business
  • Starweaver for Campus
  • Teach with Starweaver
footer-brand-logo
  • COMPANY
  • About Us
  • Support and Knowledge Base
  • Policies & Terms
  • Contact
  • CONTENT
  • Courses
  • Certifications
  • Journeys
  • Test Prep
  • Meet the Gurus
  • Techbytes
  • FOR ORGANIZATIONS
  • Starweaver for Business
  • Starweaver for Campus
  • Catalogue
  • Pricing
  • Private Classes
  • PARTNER WITH US
  • Instructors & Teachers
  • Books, Writing & Publishing
  • FOLLOW US
    • facebook
    • twitter
    • linkedin
    • pinterest
    • instagram
    • youtube
Our trademarks include Starweaver®, Make genius happen™, Education you can bank on®, People are your most important assets!®, Body of Knowledge™, StarLabs™, LiveLabs™, Journeys™
© Starweaver Group, Inc. All Rights Reserved.
  1. Courses
  2. >
  3. Big Data Hadoop and Spark Developer in Amazon Elastic MapReduce

Big Data Hadoop and Spark Developer in Amazon Elastic MapReduce

The Big Data Hadoop and Spark Developer course has been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies focused on AWS and Amazon EMR.

Rod Davison
Rod Davison
Data Science | advanced | 7 hours 5 minutes |   Published: Oct 2020
In partnership with:  Coursera

    Discussions

Overview

13.4KSTUDENTS*
94.3%RECOMMEND*

This course includes:

  • 7+ hours of on-demand video
  • 4 modules
  • Core level
  • Direct access/chat with the instructor
  • 100% self-paced online
  • Many downloadable resources
  • Shareable certificate of completion
The Big Data Hadoop and Spark developer course has been designed to provide in-depth knowledge of Big Data processing using Hadoop and Spark in the AWS Elastic Map Reduce (EMR) environment. The practical hands-on aspects of the course focus on Amazon Web Services (AWS) and Amazon Elastic MapReduce (EMR), where all demos take place.  

Skills You Will Gain

Amazon Elastic MapReduce (EMR)
Big Data
Data Science
Handoop
Spark Developer

Learning Outcomes (At the end of this program you will be able to)

  • Describe the different main Hadoop ecosystems such as Hadoop 3, Yarn, Pig, and Hive.
  • Explain the functionality and architecture of the Hadoop Distributed File System (HDFS) and YARN resource management.
  • Explain how Map Reduce works and how it is implemented in the Hadoop environment.
  • Explain what the file formats used in BIg Data – Avro, Parquet, and Orc; and how to use them in the EMR environment.
  • Explain the difference between traditional RDBMS and Hive tables.
  • Explain the architecture and functionality of Spark
  • Use resilient distribution datasets (RDD) for data processing in Spark.
  • Implement and build Spark applications.
  • Write a basic functional code in Scala to run a Spark application.
  • Explain parallel processing in Spark and Spark optimization using Catalyst and Tungsten.
  • Use Spark SQL, creating, transforming, and querying Data frames.
  • Explain the differences and use cases for Spark RDDs, DataFrames, and DataSets.
  • Create and deploy AWS EC2 instances, EBS, and S3 storage volumes.
  • Explain the pricing models for AWS storage and compute resources.
  • Configure and deploy an EMR Cluster in AWS.
  • Run Spark and Map Reduce applications in batch and interactive mode on EMR.
  • Create and use EMR notebooks.
  • Explain the differences between IaaS and PaaS resources in AWS
  • Create and manage a personal AWS account

Prerequisites

  • A basic level of conceptual understanding of data warehouses is assumed, as well as an awareness of the core functions of SQL.

Who Should Attend

  • Data Analysts
  • Data Engineers
  • Data Scientists
  • Database Architects
  • Database Administrators

Curriculum

Instructors

Frequently Asked Questions

How much do the courses at Starweaver cost?

We offer flexible payment options to make learning accessible for everyone. With our Pay-As-You-Go plan, you can pay for each course individually. Alternatively, our Subscription-Based plan provides you with unlimited access to all courses for a monthly or yearly fee.

Do you offer any certifications upon completion of a course at Starweaver?

Yes, we do offer a certification upon completion of our course to showcase your newly acquired skills and expertise.

Does Starweaver offer any free courses or trials?

No, we don't offer any free courses, but we do offer 5-day trial only on our subscriptions-based plans.

Are Starweaver's courses designed for beginners or advanced students?

Our course is designed with three levels to cater to your learning needs - Core, Intermediate, and Advanced. You can choose the level that best suits your knowledge and skillset to enhance your learning experience.

What payment options are available for Starweaver courses?

We accept various payment methods such as major credit cards, PayPal, wire transfer, and company purchase orders. For more information related to payments contact customer support.

Do you offer refunds?

Yes, we do offer a 100% refund guarantee for our courses within a specified time frame. If you are not satisfied with the course, contact our customer support team to request a refund with your order details. Some restrictions may apply.

*Where courses have been offered multiple times, the “# Students” includes all students who have enrolled. The “%Recommended” shown is also based on this data.
1Welcome to the course!
2Program Announcements
3Labs
4Quizzes
5Module 0
6Module 1
7Module 2
8Module 3
9Recommended Further Readings

Announcement 001 - You MUST Create a FREE Amazon Web Services (AWS) account

Labs - Overview

Labs - Module 1

Labs - Module 2

Labs - Module 3

Module 3 - Overview and Learning Objectives

Module 3.1 - EMR Notebooks

Module 3.2 - EMR Terminated Cluster

Module 3.3 - Scala WordCount 1

Module 3.4 - Scala WordCount 2

Module 3.5 - Scala DAG

Module 3.6 - Python WordCount

Module 3.7 - DataFrames

Module 3.8 - Spark SQL

Module 3.9 - RDD Type Safety

Module 3.10 - Type Schema

Module 3.11 - RDD DF DS

Presentation/Slides 3.1 - Spark Processing

Presentation/Slides 3.2 - SQL Datasets and Files

Module 0 - Setting up an AWS Account

Module 0.1 - Creating an AWS Account

Module 0.2 - Creating a Working User

Module 0.3 - Creating a Billing Alert

About this course: Overview, Learning Outcomes, Who Should Enroll...

Curriculum Description

Instructor Bio - Rod Davison

Module 1 - Overview and Learning Objectives

Module 1.1 - AWS Tour

Module 1.2 - Infrastructure as a Service

Module 1.3 - Platform as a Service

Module 1.4 - Creating an EC2 VM

Module 1.5 - Reserved Pricing

Module 1.6 - Installing Software

Module 1.7 - Security Groups

Module 1.8 - Elastic IPs

Module 1.9 - AMIs

Module 1.10 - Formatting EBS

Module 1.11 - Sharing EBS Drives

Module 1.12 - Creating EBS Drives

Module 1.13 - Creating and Using Snapshots

Module 1.14 - Moving Resources

Module 1.15 - Lifecycle Management

Module 1.16 - S3 Buckets

Module 1.17 - S3 Storage Classes

Module 1.18 - IAM Policies

Module 1.19 - S3 Buckets Policies

Module 1.20 - Bucket Versioning

Module 1.21 - (Bonus) Linux ssh

Module 1.22 - (Bonus) Windows ssh

Module 1.23 - (Bonus) AWS CLI

Module 1.24 - (Bonus) AWS VPNs

Presentation/Slides 1.1 - AWS Introduction

Presentation/Slides 1.2 - Elastic Computing

Presentation/Slides 1.3 - EBS Elastic Storage

Presentation/Slides 1.4 - S3 Storage

Quiz - Overview

AWS Ecosystem Quiz

EC2 Quiz

EBS Quiz

S3 Quiz

Big Data EMR Quiz

EMR Configuration Quiz

Map Reduce, Pig and Hive

EMR Notebooks and Spark

RDDs, Dataframes and Datasets

Spark SQL and File Formats

Module 2 - Overview and Learning Objectives

Module 2.1 - EMR Configuration Basics

Module 2.2 - EMR Uniform Hardware

Module 2.3 - EMR Fleet Provisioning

Module 2.4 - AWS Cloud Services

Module 2.5 - AWS Step Function

Module 2.6 - EMR Hive Demo

Module 2.7 - EMR Running a Fleet Cluster

Module 2.8 - Batch Pig and Hive

Presentation/Slides 2.1 - Big Data and Hadoop

Presentation/Slides 2.2 - EMR Configuration

Presentation/Slides 2.3 - Map Reduce

Presentation/Slides 2.4 - Hive and Spark

References for Further Study

AWS Cost and Billing Management User Guide

AWS EC2 User Guide

AWS EBS User Guide

AWS EFS User Guide

AWS S3 Console User Guide

AWS EMR Documentation

Rod Davison

Rod Davison

With over 40 years of experience spanning project management, artificial intelligence, and IT infrastructure, I bring a wealth of knowledge and insight to the educational realm. My expertise extends into social research, theoretical mathematics, and linguistics, which enriches my approach to teaching and course development. I am dedicated to crafting educational resources that address emerging technologies and pressing social issues, ensuring that learners gain practical and forward-thinking skills. My career has involved significant roles in advanced software design and development, research in artificial intelligence and non-linear systems, and corporate training across various domains. I have a strong track record in content development for technical courses, project management, and educational methodologies. My commitment is to support innovative and challenging projects, helping to improve quality and capability in both technical and soft skills. In addition to my technical and scientific endeavors, I have extensive experience in public speaking, narration, and multimedia presentations. I am passionate about making a positive impact through education, striving to support those working towards meaningful change in a rapidly evolving world.
VIEW MY CHANNEL