This Applied Data Science with Python training course provides theoretical and practical aspects of using Python in the realm of Data Science, Business Analytics, and Data Logistics. The coverage of the related core concepts, terminology, and theory is provided as well. This intensive training course is supplemented by a variety of hands-on labs (the list of which is provided at the bottom of this outline) that help attendees reinforce their theoretical knowledge of the learned material.

Topics:

Applied Data Science and Business Analytics
Common Data Science algorithms for supervised and unsupervised machine learning
NumPy, pandas, Matplotlib, scikit-learn
Python REPLs
Jupyter notebooks
Data analytics life-cycle phases
Data repairing and normalizing
Data aggregation and grouping
Data visualization

Applied Data Science with Python

Price €1,250.00

Course Code

GTBDP1

Duration

2 Day

Course Fee

POA

Accreditation

N/A

Target Audience

Business Analysts, Developers, IT Architects, and Technical Managers

Attendee Requirements

Participants should have a working knowledge of Python (or have the programming background and/or the ability to quickly pick up Python’s syntax), and be familiar with core statistical concepts (variance, correlation, etc.)

Private Team Training is available for this course

Private Team Training | Contact us

Ways to Attend this Course

Private Training
Virtual Training

+353 1 402 9423

hello@guruteamirl.com

Expand all

Course Description

Topics:

Applied Data Science and Business Analytics
Common Data Science algorithms for supervised and unsupervised machine learning
NumPy, pandas, Matplotlib, scikit-learn
Python REPLs
Jupyter notebooks
Data analytics life-cycle phases
Data repairing and normalizing
Data aggregation and grouping
Data visualization

Course Outline

Chapter 1. Python for Data Science

Using Modules
Listing Methods in a Module
Creating Your Own Modules
List Comprehension
Dictionary Comprehension
String Comprehension
Python 2 vs Python 3
Sets (Python 3+)
Python Idioms
Python Data Science “Ecosystem”
NumPy
NumPy Arrays
NumPy Idioms
pandas
Data Wrangling with pandas' DataFrame
SciPy
Scikit-learn
SciPy or scikit-learn?
Matplotlib
Python vs R
Python on Apache Spark
Python Dev Tools and REPLs
Anaconda
IPython
Visual Studio Code
Jupyter
Jupyter Basic Commands

Chapter 2. Applied Data Science

What is Data Science?
Data Science Ecosystem
Data Mining vs. Data Science
Business Analytics vs. Data Science
Data Science, Machine Learning, AI?
Who is a Data Scientist?
Data Science Skill Sets Venn Diagram
Data Scientists at Work
Examples of Data Science Projects
An Example of a Data Product
Applied Data Science at Google
Data Science Gotchas

Chapter 3. Data Analytics Life-cycle Phases

Big Data Analytics Pipeline
Data Discovery Phase
Data Harvesting Phase
Data Priming Phase
Data Logistics and Data Governance
Exploratory Data Analysis
Model Planning Phase
Model Building Phase
Communicating the Results
Production Roll-out

Chapter 4. Repairing and Normalizing Data

Repairing and Normalizing Data
Dealing with the Missing Data
Sample Data Set
Getting Info on Null Data
Dropping a Column
Interpolating Missing Data in pandas
Replacing the Missing Values with the Mean Value
Scaling (Normalizing) the Data
Data Preprocessing with scikit-learn
Scaling with the scale() Function
The MinMaxScaler Object

Chapter 5. Descriptive Statistics Computing Features in Python

Descriptive Statistics
Non-uniformity of a Probability Distribution
Using NumPy for Calculating Descriptive Statistics Measures
Finding Min and Max in NumPy
Using pandas for Calculating Descriptive Statistics Measures
Correlation
Regression and Correlation
Covariance
Getting Pairwise Correlation and Covariance Measures
Finding Min and Max in pandas DataFrame

Chapter 6. Data Aggregation and Grouping

Data Aggregation and Grouping
Sample Data Set
The pandas.core.groupby.SeriesGroupBy Object
Grouping by Two or More Columns
Emulating the SQL's WHERE Clause
The Pivot Tables
Cross-Tabulation

Chapter 7. Data Visualization with matplotlib

Data Visualization
What is matplotlib?
Getting Started with matplotlib
The Plotting Window
The Figure Options
The matplotlib.pyplot.plot() Function
The matplotlib.pyplot.bar() Function
The matplotlib.pyplot.pie () Function
Subplots
Using the matplotlib.gridspec.GridSpec Object
The matplotlib.pyplot.subplot() Function
Hands-on Exercise
Figures
Saving Figures to File
Visualization with pandas
Working with matplotlib in Jupyter Notebooks

Chapter 8. Data Science and ML Algorithms in scikit-learn

Data Science, Machine Learning, AI?
Types of Machine Learning
Terminology: Features and Observations
Continuous and Categorical Features (Variables)
Terminology: Axis
The scikit-learn Package
scikit-learn Estimators
Models, Estimators, and Predictors
Common Distance Metrics
The Euclidean Metric
The LIBSVM format
Scaling of the Features
The Curse of Dimensionality
Supervised vs Unsupervised Machine Learning
Supervised Machine Learning Algorithms
Unsupervised Machine Learning Algorithms
Choose the Right Algorithm
Life-cycles of Machine Learning Development
Data Split for Training and Test Data Sets
Data Splitting in scikit-learn
Hands-on Exercise
Classification Examples
Classifying with k-Nearest Neighbors (SL)
k-Nearest Neighbors Algorithm
k-Nearest Neighbors Algorithm
The Error Rate
Hands-on Exercise
Dimensionality Reduction
The Advantages of Dimensionality Reduction
Principal component analysis (PCA)
Hands-on Exercise
Data Blending
Decision Trees (SL)
Decision Tree Terminology
Decision Tree Classification in Context of Information Theory
Information Entropy Defined
The Shannon Entropy Formula
The Simplified Decision Tree Algorithm
Using Decision Trees
Random Forests
SVM
Naive Bayes Classifier (SL)
Naive Bayesian Probabilistic Model in a Nutshell
Bayes Formula
Classification of Documents with Naive Bayes
Unsupervised Learning Type: Clustering
Clustering Examples
k-Means Clustering (UL)
k-Means Clustering in a Nutshell
k-Means Characteristics
Regression Analysis
Simple Linear Regression Model
Linear vs Non-Linear Regression
Linear Regression Illustration
Major Underlying Assumptions for Regression Analysis
Least-Squares Method (LSM)
Locally Weighted Linear Regression
Regression Models in Excel
Multiple Regression Analysis
Logistic Regression
Regression vs Classification
Time-Series Analysis
Decomposing Time-Series

Lab Exercises

Lab 1 - Learning the Lab Environment

Lab 2 - Using Jupyter Notebook

Lab 3 - Repairing and Normalizing Data

Lab 4 - Computing Descriptive Statistics

Lab 5 - Data Grouping and Aggregation

Lab 6 - Data Visualization with matplotlib

Lab 7 - Data Splitting

Lab 8 - k-Nearest Neighbors Algorithm

Lab 9 - The k-means Algorithm

Lab 10 - The Random Forest Algorithm

Learning Path

Please contact us for suggestions.

Ways to Attend

Attend a public course, if there is one available. Please check our schedule, or register your interest in joining a course in your area.
Private onsite Team training also available, please contact us to discuss. We can customise this course to suit your business requirements.

Applied Data Science with Python

Course Code

Duration

Course Fee

Accreditation

Target Audience

Attendee Requirements

Ways to Attend this Course

Private Team Training

What Our Clients Say

Technical ICT learning & mentoring services

About GuruTeam

Download our eBrochure

Our Accreditation Partners

Upcoming Courses

Kubernetes Administration

RUST

Introduction to Python 3

GO LANG TRAINING