starweaver-logo
LOG INGET STARTED
LOG INGET STARTED
  • Browse
  • Doing

  • On Air
  • Channels
  • Career Paths
  • LEARNING

  • Courses
  • Certifications
  • Journeys
  • Test Prep
  • CONNECTING

  • How It Works
  • Community
  • Techbytes
  • Podcasts
  • Leaderboards
  • SUPPORT

  • Support & FAQs
  • Starweaver for Business
  • Starweaver for Campus
  • Teach with Starweaver
footer-brand-logo
  • COMPANY
  • About Us
  • Support and Knowledge Base
  • Policies & Terms
  • Contact
  • CONTENT
  • Courses
  • Certifications
  • Journeys
  • Test Prep
  • Meet the Gurus
  • Techbytes
  • FOR ORGANIZATIONS
  • Starweaver for Business
  • Starweaver for Campus
  • Catalogue
  • Pricing
  • Private Classes
  • PARTNER WITH US
  • Instructors & Teachers
  • Books, Writing & Publishing
  • FOLLOW US
    • facebook
    • twitter
    • linkedin
    • pinterest
    • instagram
    • youtube
Our trademarks include Starweaver®, Make genius happen™, Education you can bank on®, People are your most important assets!®, Body of Knowledge™, StarLabs™, LiveLabs™, Journeys™
© Starweaver Group, Inc. All Rights Reserved.
  1. Journeys
  2. >
  3. Big Data Fundamentals

Big Data Fundamentals

This course helps you to understand Elasticsearch as a datastore and as NoSQL, as well as the Spark processing engine. The Big Data Hadoop and Spark developer course has been designed to provide in-depth knowledge of Big Data processing using Hadoop and Spark in the AWS Elastic Map Reduce (EMR) environment.
Rod Davison
Rod Davison
Data Science | core | 12 hours 10 minutes

    Discussions

Overview

2KSTUDENTS*
96.9%RECOMMEND*

 This journey includes:

  • 12 hours of on-demand video
  • 10 modules
  • Core level
  • Direct access/chat with the instructor
  • 100% self-paced online
  • Many downloadable resources
  • Shareable certificate of completion
More and more businesses are realizing the power of data and taking on the challenge of analyzing big data to gain deeper insights that give a competitive advantage in their services and offerings to win more customers and retain the existing ones.  This course helps you to understand Elasticsearch as a datastore and as NoSQL, as well as the Spark processing engine. The Big Data Hadoop and Spark developer course has been designed to provide in-depth knowledge of Big Data processing using Hadoop and Spark in the AWS Elastic Map Reduce (EMR) environment.

Skills You Will Gain

Big Data
Hadoop
Hive

Learning Outcomes (At the end of this program you will be able to)

  • Data ingestion from different sources to Azure Data Lake, using Azure Data Factory.
  • Create data transformation pipelines using Hive and Spark.
  • Processing big data using spark SQL using Scala and deploying and executing on Azure HDInsight spark cluster.
  • Understand Elasticsearch as a data store
  • Appreciate how to use Elasticsearch as a NoSQL
  • Describe the different main Hadoop ecosystems such as Hadoop 3, Yarn, Pig, and Hive.
  • Explain the functionality and architecture of the Hadoop Distributed File System (HDFS) and YARN resource management.
  • Have a clear understanding of cloud-based Big Data analytics tools

Who Should Attend

  • Anyone interested in cloud technologies and computing, and interested in Azure, in particular.
  • Data Analysts, Data Engineers, Data Scientists, Database Architects, Database Administrators
  • All project management and business analysis professionals

Curriculum

Instructors

Frequently Asked Questions

How much do the courses at Starweaver cost?

We offer flexible payment options to make learning accessible for everyone. With our Pay-As-You-Go plan, you can pay for each course individually. Alternatively, our Subscription-Based plan provides you with unlimited access to all courses for a monthly or yearly fee.

Do you offer any certifications upon completion of a course at Starweaver?

Yes, we do offer a certification upon completion of our course to showcase your newly acquired skills and expertise.

Does Starweaver offer any free courses or trials?

No, we don't offer any free courses, but we do offer 5-day trial only on our subscriptions-based plans.

Are Starweaver's courses designed for beginners or advanced students?

Our course is designed with three levels to cater to your learning needs - Core, Intermediate, and Advanced. You can choose the level that best suits your knowledge and skillset to enhance your learning experience.

What payment options are available for Starweaver courses?

We accept various payment methods such as major credit cards, PayPal, wire transfer, and company purchase orders. For more information related to payments contact customer support.

Do you offer refunds?

Yes, we do offer a 100% refund guarantee for our courses within a specified time frame. If you are not satisfied with the course, contact our customer support team to request a refund with your order details. Some restrictions may apply.

*Where courses have been offered multiple times, the “# Students” includes all students who have enrolled. The “%Recommended” shown is also based on this data.
Big Data Hadoop and Spark Developer in Amazon Elastic MapReduce

Big Data Hadoop and Spark Developer in Amazon Elastic MapReduce

Part 1: Big Data Hadoop and Spark Developer in Amazon Elastic MapReduce

The Big Data Hadoop and Spark Developer course has been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies focused on AWS and Amazon EMR.

1Welcome to the course!
2Program Announcements
3Labs
4Quizzes
5Module 0
6Module 1
7Module 2
8Module 3
9Recommended Further Readings

Announcement 001 - You MUST Create a FREE Amazon Web Services (AWS) account

Rod Davison

Rod Davison

With over 40 years of experience spanning project management, artificial intelligence, and IT infrastructure, I bring a wealth of knowledge and insight to the educational realm. My expertise extends into social research, theoretical mathematics, and linguistics, which enriches my approach to teaching and course development. I am dedicated to crafting educational resources that address emerging technologies and pressing social issues, ensuring that learners gain practical and forward-thinking skills. My career has involved significant roles in advanced software design and development, research in artificial intelligence and non-linear systems, and corporate training across various domains. I have a strong track record in content development for technical courses, project management, and educational methodologies. My commitment is to support innovative and challenging projects, helping to improve quality and capability in both technical and soft skills. In addition to my technical and scientific endeavors, I have extensive experience in public speaking, narration, and multimedia presentations. I am passionate about making a positive impact through education, striving to support those working towards meaningful change in a rapidly evolving world.
VIEW MY CHANNEL

About this course: Overview, Learning Outcomes, Who Should Enroll...

Curriculum Description

Instructor Bio - Rod Davison

Module 0 - Setting up an AWS Account

Module 0.1 - Creating an AWS Account

Module 0.2 - Creating a Working User

Module 0.3 - Creating a Billing Alert

Labs - Overview

Labs - Module 1

Labs - Module 2

Labs - Module 3

Module 2 - Overview and Learning Objectives

Module 2.1 - EMR Configuration Basics

Module 2.2 - EMR Uniform Hardware

Module 2.3 - EMR Fleet Provisioning

Module 2.4 - AWS Cloud Services

Module 2.5 - AWS Step Function

Module 2.6 - EMR Hive Demo

Module 2.7 - EMR Running a Fleet Cluster

Module 2.8 - Batch Pig and Hive

Presentation/Slides 2.1 - Big Data and Hadoop

Presentation/Slides 2.2 - EMR Configuration

Presentation/Slides 2.3 - Map Reduce

Presentation/Slides 2.4 - Hive and Spark

Module 3 - Overview and Learning Objectives

Module 3.1 - EMR Notebooks

Module 3.2 - EMR Terminated Cluster

Module 3.3 - Scala WordCount 1

Module 3.4 - Scala WordCount 2

Module 3.5 - Scala DAG

Module 3.6 - Python WordCount

Module 3.7 - DataFrames

Module 3.8 - Spark SQL

Module 3.9 - RDD Type Safety

Module 3.10 - Type Schema

Module 3.11 - RDD DF DS

Presentation/Slides 3.1 - Spark Processing

Presentation/Slides 3.2 - SQL Datasets and Files

Quiz - Overview

AWS Ecosystem Quiz

EC2 Quiz

EBS Quiz

S3 Quiz

Big Data EMR Quiz

EMR Configuration Quiz

Map Reduce, Pig and Hive

EMR Notebooks and Spark

RDDs, Dataframes and Datasets

Spark SQL and File Formats

Module 1 - Overview and Learning Objectives

Module 1.1 - AWS Tour

Module 1.2 - Infrastructure as a Service

Module 1.3 - Platform as a Service

Module 1.4 - Creating an EC2 VM

Module 1.5 - Reserved Pricing

Module 1.6 - Installing Software

Module 1.7 - Security Groups

Module 1.8 - Elastic IPs

Module 1.9 - AMIs

Module 1.10 - Formatting EBS

Module 1.11 - Sharing EBS Drives

Module 1.12 - Creating EBS Drives

Module 1.13 - Creating and Using Snapshots

Module 1.14 - Moving Resources

Module 1.15 - Lifecycle Management

Module 1.16 - S3 Buckets

Module 1.17 - S3 Storage Classes

Module 1.18 - IAM Policies

Module 1.19 - S3 Buckets Policies

Module 1.20 - Bucket Versioning

Module 1.21 - (Bonus) Linux ssh

Module 1.22 - (Bonus) Windows ssh

Module 1.23 - (Bonus) AWS CLI

Module 1.24 - (Bonus) AWS VPNs

Presentation/Slides 1.1 - AWS Introduction

Presentation/Slides 1.2 - Elastic Computing

Presentation/Slides 1.3 - EBS Elastic Storage

Presentation/Slides 1.4 - S3 Storage

References for Further Study

AWS Cost and Billing Management User Guide

AWS EC2 User Guide

AWS EBS User Guide

AWS EFS User Guide

AWS S3 Console User Guide

AWS EMR Documentation