starweaver-logo
LOG INGET STARTED
LOG INGET STARTED
  • Browse
  • Doing

  • On Air
  • Channels
  • Career Paths
  • LEARNING

  • Courses
  • Certifications
  • Journeys
  • Test Prep
  • CONNECTING

  • How It Works
  • Community
  • Techbytes
  • Podcasts
  • Leaderboards
  • SUPPORT

  • Support & FAQs
  • Starweaver for Business
  • Starweaver for Campus
  • Teach with Starweaver
footer-brand-logo
  • COMPANY
  • About Us
  • Support and Knowledge Base
  • Policies & Terms
  • Contact
  • CONTENT
  • Courses
  • Certifications
  • Journeys
  • Test Prep
  • Meet the Gurus
  • Techbytes
  • FOR ORGANIZATIONS
  • Starweaver for Business
  • Starweaver for Campus
  • Catalogue
  • Pricing
  • Private Classes
  • PARTNER WITH US
  • Instructors & Teachers
  • Books, Writing & Publishing
  • FOLLOW US
    • facebook
    • twitter
    • linkedin
    • pinterest
    • instagram
    • youtube
Our trademarks include Starweaver®, Make genius happen™, Education you can bank on®, People are your most important assets!®, Body of Knowledge™, StarLabs™, LiveLabs™, Journeys™
© Starweaver Group, Inc. All Rights Reserved.
  1. Courses
  2. >
  3. Master Data Analysis with Python

Master Data Analysis with Python

In this course, you will be introduced to the DataFrame and the Series, the two primary containers of data within pandas. You will learn the components of these objects and a few basic operations and also know what subset selection methods you should

Theodore Petrou
Theodore Petrou
Data Science | intermediate | 10 hours |   Published: Sep 2022
In partnership with:  Coursera

    Discussions

Overview

1.4KSTUDENTS*
94.1%RECOMMEND*

This course includes:

  • 10+ hours of on-demand video  
  • Many downloadable resources 
  • Certificate of completion  
  • Direct access/chat with the instructor 
  • 100% self-paced online 
In this course, you will be introduced to the DataFrame and the Series, the two primary containers of data within pandas. You will learn the components of these objects and a few basic operations and also know what methods of subset selection you should avoid.   You will begin performing calculations on your data. You will begin by learning how to operate on a single column of data, a pandas Series and learn the difference between methods that aggregate (return a single value) and those that do not. After learning how to operate on a single column of data, you'll learn how to operate on multiple columns at the same time by calling methods on a DataFrame. You'll learn how to change the direction of the operations from vertical to horizontal.  You'll also learn about the categorical data type, which is unique to pandas and has the ability to save a tremendous amount of memory. Up to this point in the course, all operations were applied to the entire dataset. You will learn how to apply operations to independent groups within your data instead of the whole. You will also learn how to display the results of grouping in a more human-readable way with pivot tables.  Grouping data can be tricky in pandas and potentially be one of the slowest performing operations. You will learn best practices on how to optimize performance along with the newest syntax available. 

Skills You Will Gain

Data Analysis
Data Science
Pandas
Pandas DataFrame
Python

Learning Outcomes (At the end of this program you will be able to)

  • Introduction to the pandas DataFrame and Series 
  • Understanding the different data types available within a DataFrame 
  • Accessing the DataFrame components – the index, columns, and values 
  • Setting a meaningful index in a DataFrame 
  • Completing a five-step process for data exploration 
  • Learn how to select rows and columns simultaneously 
  • Learn how to filter for specific criteria using the boolean selection 
  • Learn a more intuitive procedure for filtering data with the query method 
  • Select subsets of data from DataFrames with just the brackets, loc, and iloc 

Prerequisites

  • It is necessary to understand the fundamentals of the Python programming language.  
  • No prior experience with pandas is needed. 

Who Should Attend

  • Anyone, who wants to learn about fundamental concepts of programming. 
  • Anyone, who wants to learn python programming language. 
  • Anyone, who wants to brush up on their programming and python skills. 

Curriculum

Instructors

Frequently Asked Questions

How much do the courses at Starweaver cost?

We offer flexible payment options to make learning accessible for everyone. With our Pay-As-You-Go plan, you can pay for each course individually. Alternatively, our Subscription-Based plan provides you with unlimited access to all courses for a monthly or yearly fee.

Do you offer any certifications upon completion of a course at Starweaver?

Yes, we do offer a certification upon completion of our course to showcase your newly acquired skills and expertise.

Does Starweaver offer any free courses or trials?

No, we don't offer any free courses, but we do offer 5-day trial only on our subscriptions-based plans.

Are Starweaver's courses designed for beginners or advanced students?

Our course is designed with three levels to cater to your learning needs - Core, Intermediate, and Advanced. You can choose the level that best suits your knowledge and skillset to enhance your learning experience.

What payment options are available for Starweaver courses?

We accept various payment methods such as major credit cards, PayPal, wire transfer, and company purchase orders. For more information related to payments contact customer support.

Do you offer refunds?

Yes, we do offer a 100% refund guarantee for our courses within a specified time frame. If you are not satisfied with the course, contact our customer support team to request a refund with your order details. Some restrictions may apply.

*Where courses have been offered multiple times, the “# Students” includes all students who have enrolled. The “%Recommended” shown is also based on this data.
Theodore Petrou

Theodore Petrou

Theodore Petrou is the founder of Dunder Data and a leading educator in the fields of data science and machine learning. With a deep passion for teaching and a strong command of Python programming, he has authored several influential books, including Master Data Analysis with Python, Master Machine Learning with Python, Master the Fundamentals of Python, and the popular Pandas Cookbook. Through Dunder Data, Theodore has delivered customized corporate training to renowned organizations such as Microsoft, NASA, and the Federal Reserve, equipping thousands of professionals with practical, hands-on expertise.

With a rich background that bridges both industry and academia, Theodore brings a wealth of experience to his teaching. His career has included roles as a Quantitative Developer, Lead Data Scientist, and Credit Risk Professional, where he solved complex problems in predictive modeling, financial analytics, and data strategy. Alongside developing innovative technical solutions, he has devoted much of his career to mentorship and education, fostering a deep and lasting understanding of data science principles.

Theodore’s teaching extends beyond corporate settings into the broader educational community through live courses and workshops. Known for his clear, engaging style, he breaks down sophisticated concepts in Python and data science to make them accessible and actionable for learners of all levels. His mission is to guide aspiring and experienced professionals alike through the intricacies of data analysis and machine learning with clarity, enthusiasm, and real-world relevance.

VIEW MY CHANNEL
1Module 1: Introduction to Pandas
2Module 2: Selecting Subsets of Data
3Module 3: Essential Series Commands
4Module 4: Essential DataFrame Commands
5Module 5: Data Types
6Module 6: Grouping Data

Resources

Module 1.1 - What is Pandas?

Segment 1 - What is Pandas

Segment 2 - Which Version of Pandas to Use

Segment 3 - Pandas Examples

Module 1.2 - The DataFrame and Series

Segment 4 - Introduction to the DataFrame and Series

Segment 5 - DataFrame Components

Segment 6 - Selecting a Series

Segment 7 - Components of a Series

Segment 8 - Getting Help in a Jupyter Notebook

Segment 9 - Exercises

Modules 1.3 - Data Types and Missing Values

Segment 10 - Introduction to Data Types and Missing Values

Segment 11 - Finding the Data Type of Each Column

Segment 12 - Getting More Metadata

Segment 13 - Exercises

Module 1.4 - Setting a Meaningful Index

Segment 14 - Setting an Index of a DataFrame

Segment 15 - Accessing the Index, Columns, and Data

Segment 16 - Accessing the Components of a Series

Segment 17 - The Default Index

Segment 18 - Setting an Index on Read

Segment 19 - Choosing a Good Index

Segment 20 - Exercises

Module 1.5 - Five-Step Process for Data Exploration

Segment 21 - Five-Step Process for Data Exploration

Module 4.1 - Introduction to DataFrames

Segment 99- Introduction to DataFrames

Segment 100 - Arithmetic DataFrame Operations

Segment 102 - DataFrame Comparison Operators

Segment 103 - Overlap of DataFrame and Series Methods

Segment 104 - Data Dictionaries

Segment 105 - Exercises

Module 4.2 - Numeric DataFrame Methods

Segment 106 - Aggregation Methods

Segment 107 - Changing the Direction of the Operation

Segment 108 - Non-Aggregation Methods

Segment 109 - Summary Statistics for All Columns with the Describe Method

Segment 110 - Nuisance Columns

Segment 111 - Exercises

Module 4.3 - DataFrame Missing Value Methods

Segment 112 - The agg, idxmin, and idxmax Methods

Segment 113 - Dropping Rows and Columns with the dropna Method

Segment 114 - Filling missing values with the fillna Method

Segment 115 - The interpolate Method

Segment 116 - Exercises

Module 4.4 - DataFame Sorting, Ranking and Uniqueness

Segment 117 - Sorting

Segment 118 - Ranking

Segment 119 - Uniqueness

Segment 120 - Finding the Maximum or Minimum of a Group

Segment 121 - The value_counts Method

Segment 122 - Exercises

Module 4.5 - DataFrame Structure Methods

Segment 123 - Adding a New Column to the DataFrame

Segment 124 - Copying the DataFrame

Segment 125 - Column and Row Dropping and Renaming

Segment 126 - Inserting Columns in the Middle of a DataFrame

Segment 127 - Getting the Integer Location with the Index get_loc Method

Segment 128 - The pop Method

Segment 129 - Exercises

Module 4.6 - More DataFame Methods

Segment 130 - The isna and notna Methods

Segment 131 - Differencing methods diff and pct_change

Segment 132 - The Sample Method

Segment 133 - The nsmallest and nlargest methods

Segment 134 - The corr Method

Segment 135 - The replace Method

Segment 136 - Methods available only to Series and not DataFrames

Segment 137 - Exercises

Module 4.7 - Assigning Subsets of Data

Segment 138 - Setting New Data with loc

Segment 139 - Setting New Data with iloc

Segment 140 - Boolean Selection Assignment

Segment 141 - Improper Assignment

Segment 142 - Exercises

Module 2.1 - Selecting Subsets of Data from DataFrames with Just Brackets

Segment 22 - Introduction to Subset Selection

Segment 23 -Selecting with Just the Brackets

Segment 24 -Exercises

Module 2.2 - Selecting Subsets of Data from DataFrames with loc

Segment 25 - Simultaneous Row and Column Subset Selection

Segment 26 - Slice Notation with loc

Segment 27 - Other Subset Selections with loc

Segment 28 -Exercises

Module 2.3 - Selecting Subsets of Data with Iloc

Segment 29 - Simultaneous Row and Column Subset Selection

Segment 30 -Exercises

Module 2.4 - Selecting Subsets of Data from a Series

Segment 31 - Selecting Subsets of Data from a Series

Segment 32 -Exercises

Module 2.5 - Boolean Selection Single Condition

Segment 33 - Boolean Selection Single Conditions

Segment 34 - Practical Boolean Selection

Segment 35 - Exercises

Module 2.6 - Boolean Selection Multiple Conditions

Segment 36 - Different Logical Operators for Boolean Series

Segment 37 - Inverting a Condition with the Not Operator

Segment 38 - Many Equality Conditions in a Single Column

Segment 39 - Exercises - Boolean Selection Multiple Conditions

Module 2.7 - Boolean Selection More

Segment 40 - Boolean Selection on a Series

Segment 41 - Simultaneous Boolean Selection of Rows and Column Labels with loc

Segment 42 - Column to Column Comparison

Segment 43 - Filter for Missing Values

Segment 44 -Exercises - Boolean Selection More

Module 2.8 - Filtering with the Query Method

Segment 45 - Introduction to the Query Method

Segment 46 - Column to Column Comparison with Query

Segment 48 - Arithmetic Operations within Query

Segment 49 - Reference Variable Names

Segment 50 - Selecting Columns with Query

Segment 51 - Summary of the Query Method

Segment 52 -Exercises

Module 2.9 - Miscellaneous Subset Selection

Segment 53 - Selecting a Column with Dot Notation

Segment 54 -Selecting Rows with just the Brackets using Slice Notation

Segment 55 - Selecting a Single Cell with at and iat

Module 2.10 - Taking Certification Exam

Segment 56 - Going to Exam Website

Segment 57 - Completing the Exam

Segment 58 - Submitting the Exam

Module 5.1 - Integer, Float and Boolean Data Types

Segment 143 - Integer Data Type

Segment 144 - Changing Data Types with astype

Segment 145 - Unsigned Integers

Segment 146 - Nullable Integer Data Type

Segment 147 - Boolean Selection with Nullable Booleans

Segment 148 - Float Data Types

Segment 149 - Changing from Float to Int

Segment 150 - Pandas Nullable Float Data Type

Segment 151 - Boolean Data Type

Segment 152 - Nullable Boolean Data Type

Segment 153 - Different Syntax for Data Types

Segment 154 - Data Type Summary

Segment 155 - Exercises

Module 5.2 - Object, Categorical, and String Data Types

Segment 156 - 1 Object Data Types

Segment 157 - Categorical Data Type

Segment 158 - Internal Storage of Categorical Data

Segment 159 - The cat Acccessor

Segment 160 - Modifying Categories

Segment 161 - Massive Reduction in Memory Used

Segment 162 - Speeding Up Operations

Segment 163 - The str Accessor is Still Available

Segment 164 - Ordered Categories

Segment 165 - Integers can be Categories

Segment 166 - The New String Data Type

Segment 167 - Converting Strings to Numerica

Segment 168 - Exercises

Module 5.3 - Datetime, Timedelta, and Period Data Types

Segment 169 - The pandas datetime64 data type

Segment 170 - The pandas timedelta64 data type

Segment 171 - The pandas period data type

Segment 172 - Summary Table

Segment 173 - Exercises

Module 5.4 - DataFrame Data Type Conversion

Segment 174 - Discovering Strings in Numeric Columns

Segment 175 - Converting non-numeric values to missing

Segment 176 - The astype method for DataFrames

Segment 177 - Reading in data with known missing values

Segment 178 - More Data type Conversion with the Housing Dataset

Segment 179 - Exercises

Module 3.1 - Numeric Series Methods

Segment 59 - Numeric Series Methods

Segment 60 - Core Series Attributes

Segment 61 - Arithmetic Operators

Segment 62 - Comparison Operators

Segment 63 - Boolean and Bitwise Operators

Segment 64 - Aggregation Methods

Segment 65 - Non-Aggregation Methods

Segment 66 - Series Methods with a Non-Default Index

Segment 67 - Operations on a Boolean Series

Segment 68 - Exercises

Module 3.2 - Series Missing Value Methods

Segment 69 - The isna and notna Methods

Segment 70 - Dropping Missing Values with dropna

Segment 71 - Filling Missing Values with the fillna Method

Segment 72 - Filling Missing Values with interpolate

Segment 73 - Exercises

Segment 74 - Sorting the Value and the Index

Module 3.3 - Series Sorting, Ranking and Uniqueness

Segment 75 - Ranking

Segment 76 - Uniqueness

Segment 77 - Exercises

Module 3.4 - More Series Methods

Segment 78 - The agg, idxmin, idxmax, nsmallest, and nlargest Methods

Segment 79 - Differencing Methods diff and pct_change

Segment 80 - Randomly Sample a Series

Segment 81 - The replace Method

Segment 82 - Exercises

Module 3.5 - String Series Methods

Segment 83 - String Series Methods

Segment 84 - The value_counts Method

Segment 85 - The split String Method

Segment 86 - Special Methods Just for Object Columns

Segment 87 - More String-Only Methods

Segment 88 - The replace String Method

Segment 89 - Selecting Subsets with the Brackets

Segment 90 - Exericses

Module 3.6 - Datetime Series Methods

Segment 91 - Datetime Attributes

Segment 92 - Datetime Methods

Segment 93 - Format Time as a String with strftime

Segment 94 - Convert to Period

Segment 95 - Timedeltas

Segment 96 - Datetime Series Methods

Module 3.7 - Project - Testing Normality of Stock Market Returns

Segment 97 - Project - Testing Normality of Stock Market Returns

Segment 98 - Exercises

Module 6.1 - Grouping Aggregation Basics

Segment 180- Grouping Aggregation Basics

Segment 181 - Grouping with the groupby Method

Segment 182 - Use String Names for Aggregation Functions

Segment 183 - Aligning the Dots when Method Chaining

Segment 184 - The Index When Grouping

Segment 185 - The GroupBy Object

Segment 186 - Exercises

Module 6.2 - Grouping and Aggregating Multiple Columns

Segment 187 - Grouping with Multiple Columns

Segment 188 - Aggregating Multiple Columns

Segment 189 - Getting the size of each group

Segment 190 - Exercises

Module 6.3 - Grouping with Pivot Tables

Segment 191 - Creating Pivot Tables with Pandas

Segment 192 - Where is the Pivoting

Segment 193 - Styling Pivot Tables

Segment 194 - Getting the Size of each Group

Segment 195 - Add Marging to get Row and Column Totals

Segment 196 - Non-Standard Pivot Tables

Segment 197 - Exercises

Module 6.4 - Counting with Crosstabs

Segment 198 - Counting the Frequency with the crosstab Function

Segment 199 - Normalizing Other Aggregations

Segment 200 - crosstab is almost unnecessary in pandas

Segment 201 - Exercises

Module 6.5 - Alternative Groupby Syntax

Segment 202 - Alternative Groupby Syntax

Segment 203 - Exercises

Module 6.6 - Custom Aggregation

Segment 204 - Using a Custom Aggregation Function

Segment 205 - Custom aggregation functions must return a single value

Segment 206 - Find the mean salary for the five highest paid employees per department

Segment 207 - What percent of total salary do these five employees represent

Segment 208 - Using a custom aggregation function in a pivot table

Segment 209 - Percentage of employees by department with salaries greater than 100,000

Segment 210 - Optimizing a custom aggregation function

Segment 211 - Complete operations that are independent of the group outside of the custom function

Segment 212 - Exercises

Module 6.7 - Filer and Transform with Groupby

Segment 213 - The filter Method

Segment 214 - Viewing each Sub-DataFrame

Segment 215 - Summary of the GroupBy filter Method

Segment 216 - Finding actors that appear in at least 25 movies

Segment 217 - The groupby transform Method

Segment 218 - transform second use case - return a new value for each row in the group

Segment 219 - Find Difference from the Mean

Segment 220 - Transforming multiple columns

Segment 221 - Summary of the groupby transform method

Segment 222 - Exercises

Module 6.8 - More Groupby Methods

Segment 223 - Kinds of groupby attributes and methods

Segment 224 - head, tail, and nth groupby methods

Segment 225 - Groupby Methods Unique to Series

Segment 226 - Non-aggregating Methods

Module 6.9 - Binning Numeric Columns

Segment 227 - Exercises

Segment 228 - Binning with pd.cut

Segment 229 - Cut into a specific number of bins

Segment 230 - Quantile binning with pd.qcut

Module 6.10 - Miscellaneous Grouping Functionality

Segment 231 - Grouping with Bins

Segment 232 - Exercises

Segment 233 - Grouping by Columns not in the DataFrame

Segment 234 - Grouping Series and aggregating other columns

Segment 235 - Change the Direction of Grouping

Segment 236 - Exercises