Big Data For Architects
This course not only covers all the basic Big Data tools and technologies but also covers the in detail differentiation between the same set of technologies.
Overview
This course includes:
- 7+ hours of on-demand video
- 12 modules
- Intermediate level
- Direct access/chat with the instructor
- 100% self-paced online
- Many downloadable resources
- Shareable certificate of completion
Skills You Will Gain
Learning Outcomes (At the end of this program you will be able to)
- Gain a holistic understanding of the Big Data Ecosystem
- Understand Data Pipelines
- Develop Batch and Stream Pipelines
- Which Big Data Technology to choose when?
- Understand thought process in choosing Big Data Ingestion, Storage, Processing and Analysis related Technologies
Prerequisites
- Knowledge equivalent to Big Data Crash Course
- Basics of SQL and RDBMS
- Unix/Linux Basic Commands like mkdir, ls, cat etc.
- Python/Java (not used extensively in the course)
- Credit card for setting up GCP account (no charges will be deducted if using GCP trial version). You can perform all exercises of this course without incurring charges. Please refer “GCP Account Best Practices” Section for more details.
- Twitter Account
Who Should Attend
- Big Data Leads/Architects who want to enhance their Big Data knowledge
- Engineers who would like to transition their roles into Big Data Technologies
- Big Data Engineers planning to appear for Certifications like CCA175, CCA159
- Big Data Engineers who are looking for Promotion
Curriculum
1Welcome to the course!
About this course: Overview, Learning Outcomes, Who Should Enroll...
2Module 1: Course Overview
Segment - 01 - Course Structure and Approach
Segment - 02 - Pre-requisites
Segment - 03 - Course Audience
Segment - 04 - About Instructor
3Module 2: Environment Setup
Segment - 05 - Google Cloud Account Setup
Segment - 06 - Creating a Dataproc Cluster
Segment - 07 - GCP Account Best Practices
Installation DataProc cluster
4Module 3: Holistic View, Architectures and Pipelines
Segment - 08 - Big Data Logical Architecture
Segment - 09 - Evolution of Big Data Technologies
Segment - 10 - Key Big Data Architectures
Segment - 11 - Typical Big Data Batch Pipeline
Segment - 12 - Typical Big Data Streaming Pipeline
Segment - 13 - Bonus 1 - Another Example of Big Data Streaming Pipeline
Segment - 14 - Bonus 2 - Another Example of Big Data Streaming Pipeline
5Module 4: Key Ingestion-Data Flow Frameworks
Segment - 15 - Factors to consider while comparing Ingestion frameworks
Segment - 16 - Kafka vs Flume
Segment - 17 - NiFi vs Kafka
Segment - 18 - Sqoop vs Flume
Segment - 19 - Sqoop vs Kafka Connect
Segment - 20 - Hands-on NiFi Installation
Segment - 21 - Hands-on Kafka Installation
Segment - 22 - Hands-on Kafka and NiFi Integration Background
Segment - 23 - Hands-on Kafka and NiFi Integration
6Module 5: Key Storage Frameworks
Segment - 24 - Factors to consider while comparing Storage frameworks
Segment - 25 - HDFS vs HBase
Segment - 26 - HBase vs Kudu
Segment - 27 - HDFS vs Kudu
Segment - 28 - HBase vs Cassandra
7Module 6: Data formats
Segment - 29 - Text vs Binary
Segment - 30 - Interoperability
Segment - 31 - Row Oriented vs Column Oriented
Segment - 32 - Splittable Formats
Segment - 33 - Schema Evolution
Segment - 34 - Comparing Data Formats
Segment - 35 - Hands-on Sqoop Installation on Dataproc Cluster
Segment - 36 - Hands-on Big Data Batch Pipeline Use Avro Format
8Module 7: Key Data Processing Frameworks
Segment - 37 - Factors to consider while comparing Processing frameworks
Segment - 38 - MR vs Spark Logical Architecture Perspective
Segment - 39 - MR vs Spark Performance Perspective
Segment - 40 - Spark vs Tez
Segment - 41 - Spark vs Flink
Segment - 42 - Kafka Streams vs Spark Streaming
Segment - 43 - Spark 2.x Streaming vs Spark 1.x Streaming
Segment - 44 - Spark Core vs Spark SQL
Segment - 45 - Hands-on Kafka & Spark Streaming Integration
9Module 8: Key Data Analysis Frameworks
Segment - 46 - Factors to consider while comparing Analysis frameworks
Segment - 47 - Hive vs Impala
Segment - 48 - Hive vs Pig
Segment - 49 - Hive vs Spark SQL
Segment - 50 - Hive vs Hive LLAP vs Impala
Segment - 51 - Hive vs KSQL
Segment - 52 - 7. KSQL vs KSQLDB
Segment - 53 - Hands-on KSQL
Segment - 54 - Hands-on Write to a Stream and Table using KSQL
Segment - 55 - Hands-on Streaming ETL Pipeline Background
Segment - 56 - Hands-on Build a Scalable ETL Pipeline with Kafka Connect - part 1
Segment - 57 - Hands-on Build a Scalable ETL Pipeline with Kafka Connect - part 2
10Module 9: Delta Lake
Segment - 58 - Delta Architecture
Segment - 59 - Why Delta Lake?
Segment - 60 - Challenges with Data Lake
Segment - 61 - Delta Lake Demo
11Module 10: Bonus
Segment - 62 - Solr vs ElasticSearch
Segment - 63 - Cloudera Search vs Solr
Segment - 64 - Oozie vs Airflow
Segment - 65 - KSQL vs KStreams
12Module 11: Epilogue
Segment - 66 - Conclusion
Instructors

Bhavuk Chawla
With a distinguished career spanning decades in cutting-edge technologies such as Generative AI, Machine Learning, Cloud Computing, and Big Data Analytics, Bhavuk brings a wealth of hands-on experience and strategic insight to senior professionals seeking to elevate their skill sets. As an elite instructor on platforms like Pluralsight and through partnerships with industry giants including Google, Adobe, and Microsoft, he has empowered over 150,000 participants across the globe.
Recognized as Cloudera Instructor of the Year in 2016 and widely regarded as a Google Cloud and AI Evangelist, Bhavuk is committed to delivering transformative learning experiences tailored to diverse audiences—from CEOs to developers. His extensive background in technology consulting includes architecting scalable, cross-platform solutions that address the unique needs of global enterprises.
Currently serving as Head of Big Data Sciences & AI Practice and Co-Founder of several technology transformation ventures, Bhavuk has led high-impact training and consulting initiatives that drive innovation and operational excellence. His approach combines theoretical expertise with real-world problem-solving, gained through working closely with Fortune 500 companies.
Passionate about fostering a culture of continuous learning and knowledge sharing, Bhavuk ensures that every engagement—whether a training session or a strategic consultation—equips professionals with the tools they need to thrive in today’s fast-evolving technological landscape. Through comprehensive and impactful educational programs, he is dedicated to helping teams and individuals achieve lasting success.
Frequently Asked Questions
How much do the courses at Starweaver cost?
We offer flexible payment options to make learning accessible for everyone. With our Pay-As-You-Go plan, you can pay for each course individually. Alternatively, our Subscription-Based plan provides you with unlimited access to all courses for a monthly or yearly fee.
Do you offer any certifications upon completion of a course at Starweaver?
Yes, we do offer a certification upon completion of our course to showcase your newly acquired skills and expertise.
Does Starweaver offer any free courses or trials?
No, we don't offer any free courses, but we do offer 5-day trial only on our subscriptions-based plans.
Are Starweaver's courses designed for beginners or advanced students?
Our course is designed with three levels to cater to your learning needs - Core, Intermediate, and Advanced. You can choose the level that best suits your knowledge and skillset to enhance your learning experience.
What payment options are available for Starweaver courses?
We accept various payment methods such as major credit cards, PayPal, wire transfer, and company purchase orders. For more information related to payments contact customer support.
Do you offer refunds?
Yes, we do offer a 100% refund guarantee for our courses within a specified time frame. If you are not satisfied with the course, contact our customer support team to request a refund with your order details. Some restrictions may apply.