Big Data For Architects

This course not only covers all the basic Big Data tools and technologies but also covers the in detail differentiation between the same set of technologies.

Data Science | core | 7 hours 30 minutes |   Published: Feb 2022

Overview

1.4KSTUDENTS*
94.6%RECOMMEND*

This course includes:

  • 7+ hours of on-demand video
  • 12 modules
  • Intermediate level
  • Direct access/chat with the instructor
  • 100% self-paced online
  • Many downloadable resources
  • Shareable certificate of completion
In the big data era, there is the availability of a wide spectrum of tools/technologies such as HDFS, Sqoop, Hive, Impala, HBase, Spark, Kafka, Nifi, and many more. Isn't it puzzling to select few among the set of many? Especially when there is a need to develop complete end-to-end pipelines. Because your selection will decide how efficiently you will conquer big data. If you are also confused about these tools, then “Big Data for Architects” is the best solution to this problem. This course not only covers all the basic Big Data tools and technologies but also covers the in detail differentiation between the same set of technologies. This expedition starts by exploring the big data ecosystem, then we went to learn about data pipelines and go to the exploration of batch & stream pipelines. It helps to make the data architects much more equipped with the knowledge using which they can develop the well-equipped and optimized data pipeline. 

Skills You Will Gain

Big Data
HBase
HDFS
Hive
Kafka
Nifi
Spark
Sqoop

Learning Outcomes (At the end of this program you will be able to)

  • Gain a holistic understanding of the Big Data Ecosystem 
  • Understand Data Pipelines 
  • Develop Batch and Stream Pipelines 
  • Which Big Data Technology to choose when? 
  • Understand thought process in choosing Big Data Ingestion, Storage, Processing and Analysis related Technologies 

Prerequisites

  • Knowledge equivalent to Big Data Crash Course  
  • Basics of SQL and RDBMS 
  • Unix/Linux Basic Commands like mkdir, ls, cat etc. 
  • Python/Java (not used extensively in the course) 
  • Credit card for setting up GCP account (no charges will be deducted if using GCP trial version). You can perform all exercises of this course without incurring charges. Please refer “GCP Account Best Practices” Section for more details. 
  • Twitter Account 

Who Should Attend

  • Big Data Leads/Architects who want to enhance their Big Data knowledge  
  • Engineers who would like to transition their roles into Big Data Technologies  
  • Big Data Engineers planning to appear for Certifications like CCA175, CCA159 
  • Big Data Engineers who are looking for Promotion 

Curriculum

1Welcome to the course!

About this course: Overview, Learning Outcomes, Who Should Enroll...

2Module 1: Course Overview

Segment - 01 - Course Structure and Approach

Segment - 02 - Pre-requisites

Segment - 03 - Course Audience

Segment - 04 - About Instructor

3Module 2: Environment Setup

Segment - 05 - Google Cloud Account Setup

Segment - 06 - Creating a Dataproc Cluster

Segment - 07 - GCP Account Best Practices

Installation DataProc cluster

4Module 3: Holistic View, Architectures and Pipelines

Segment - 08 - Big Data Logical Architecture

Segment - 09 - Evolution of Big Data Technologies

Segment - 10 - Key Big Data Architectures

Segment - 11 - Typical Big Data Batch Pipeline

Segment - 12 - Typical Big Data Streaming Pipeline

Segment - 13 - Bonus 1 - Another Example of Big Data Streaming Pipeline

Segment - 14 - Bonus 2 - Another Example of Big Data Streaming Pipeline

5Module 4: Key Ingestion-Data Flow Frameworks

Segment - 15 - Factors to consider while comparing Ingestion frameworks

Segment - 16 - Kafka vs Flume

Segment - 17 - NiFi vs Kafka

Segment - 18 - Sqoop vs Flume

Segment - 19 - Sqoop vs Kafka Connect

Segment - 20 - Hands-on NiFi Installation

Segment - 21 - Hands-on Kafka Installation

Segment - 22 - Hands-on Kafka and NiFi Integration Background

Segment - 23 - Hands-on Kafka and NiFi Integration

6Module 5: Key Storage Frameworks

Segment - 24 - Factors to consider while comparing Storage frameworks

Segment - 25 - HDFS vs HBase

Segment - 26 - HBase vs Kudu

Segment - 27 - HDFS vs Kudu

Segment - 28 - HBase vs Cassandra

7Module 6: Data formats

Segment - 29 - Text vs Binary

Segment - 30 - Interoperability

Segment - 31 - Row Oriented vs Column Oriented

Segment - 32 - Splittable Formats

Segment - 33 - Schema Evolution

Segment - 34 - Comparing Data Formats

Segment - 35 - Hands-on Sqoop Installation on Dataproc Cluster

Segment - 36 - Hands-on Big Data Batch Pipeline Use Avro Format

8Module 7: Key Data Processing Frameworks

Segment - 37 - Factors to consider while comparing Processing frameworks

Segment - 38 - MR vs Spark Logical Architecture Perspective

Segment - 39 - MR vs Spark Performance Perspective

Segment - 40 - Spark vs Tez

Segment - 41 - Spark vs Flink

Segment - 42 - Kafka Streams vs Spark Streaming

Segment - 43 - Spark 2.x Streaming vs Spark 1.x Streaming

Segment - 44 - Spark Core vs Spark SQL

Segment - 45 - Hands-on Kafka & Spark Streaming Integration

9Module 8: Key Data Analysis Frameworks

Segment - 46 - Factors to consider while comparing Analysis frameworks

Segment - 47 - Hive vs Impala

Segment - 48 - Hive vs Pig

Segment - 49 - Hive vs Spark SQL

Segment - 50 - Hive vs Hive LLAP vs Impala

Segment - 51 - Hive vs KSQL

Segment - 52 - 7. KSQL vs KSQLDB

Segment - 53 - Hands-on KSQL

Segment - 54 - Hands-on Write to a Stream and Table using KSQL

Segment - 55 - Hands-on Streaming ETL Pipeline Background

Segment - 56 - Hands-on Build a Scalable ETL Pipeline with Kafka Connect - part 1

Segment - 57 - Hands-on Build a Scalable ETL Pipeline with Kafka Connect - part 2

10Module 9: Delta Lake

Segment - 58 - Delta Architecture

Segment - 59 - Why Delta Lake?

Segment - 60 - Challenges with Data Lake

Segment - 61 - Delta Lake Demo

11Module 10: Bonus

Segment - 62 - Solr vs ElasticSearch

Segment - 63 - Cloudera Search vs Solr

Segment - 64 - Oozie vs Airflow

Segment - 65 - KSQL vs KStreams

12Module 11: Epilogue

Segment - 66 - Conclusion

Instructors

Bhavuk Chawla

Bhavuk Chawla

With a distinguished career spanning decades in cutting-edge technologies such as Generative AI, Machine Learning, Cloud Computing, and Big Data Analytics, Bhavuk brings a wealth of hands-on experience and strategic insight to senior professionals seeking to elevate their skill sets. As an elite instructor on platforms like Pluralsight and through partnerships with industry giants including Google, Adobe, and Microsoft, he has empowered over 150,000 participants across the globe.

Recognized as Cloudera Instructor of the Year in 2016 and widely regarded as a Google Cloud and AI Evangelist, Bhavuk is committed to delivering transformative learning experiences tailored to diverse audiences—from CEOs to developers. His extensive background in technology consulting includes architecting scalable, cross-platform solutions that address the unique needs of global enterprises.

Currently serving as Head of Big Data Sciences & AI Practice and Co-Founder of several technology transformation ventures, Bhavuk has led high-impact training and consulting initiatives that drive innovation and operational excellence. His approach combines theoretical expertise with real-world problem-solving, gained through working closely with Fortune 500 companies.

Passionate about fostering a culture of continuous learning and knowledge sharing, Bhavuk ensures that every engagement—whether a training session or a strategic consultation—equips professionals with the tools they need to thrive in today’s fast-evolving technological landscape. Through comprehensive and impactful educational programs, he is dedicated to helping teams and individuals achieve lasting success.

Frequently Asked Questions

How much do the courses at Starweaver cost?

We offer flexible payment options to make learning accessible for everyone. With our Pay-As-You-Go plan, you can pay for each course individually. Alternatively, our Subscription-Based plan provides you with unlimited access to all courses for a monthly or yearly fee.

Do you offer any certifications upon completion of a course at Starweaver?

Yes, we do offer a certification upon completion of our course to showcase your newly acquired skills and expertise.

Does Starweaver offer any free courses or trials?

No, we don't offer any free courses, but we do offer 5-day trial only on our subscriptions-based plans.

Are Starweaver's courses designed for beginners or advanced students?

Our course is designed with three levels to cater to your learning needs - Core, Intermediate, and Advanced. You can choose the level that best suits your knowledge and skillset to enhance your learning experience.

What payment options are available for Starweaver courses?

We accept various payment methods such as major credit cards, PayPal, wire transfer, and company purchase orders. For more information related to payments contact customer support.

Do you offer refunds?

Yes, we do offer a 100% refund guarantee for our courses within a specified time frame. If you are not satisfied with the course, contact our customer support team to request a refund with your order details. Some restrictions may apply.

*Where courses have been offered multiple times, the “# Students” includes all students who have enrolled. The “%Recommended” shown is also based on this data.