Objectives
This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics. The course covers the fundamental and advanced concepts and methods of deriving business insights from Big Data. The course is supplemented by handson labs that help attendees reinforce their theoretical knowledge of the learned material.
Topics
 NoSQL and Big Data Systems Overview
 Big Data Business Intelligence and Analytics
 Applied Data Science and Business Analytics
 Algorithms, Techniques and Common Analytical Methods
 Machine Learning
 Visualizing and Reporting Processed Results
 Data Analysis with R
Duration
3 Days
Target Audience
Business Analysts, IT Architects and Managers
Course Prerequisites
Participants should have the general knowledge of statistics and programming.
Suggested Follow on Courses
There are a number of options. Please contact us for further information.
Course Content
Chapter 1. Defining Big Data
 Transforming Data into Business Information
 Quality of Data
 Gartner’s Definition of Big Data
 More Definitions of Big Data
 Processing Big Data
 Challenges Posed by Big Data
 The Cloud and Big Data
 The Business Value of Big Data
 Big Data: Hype or Reality?
 Big Data Quiz
 Big Data Quiz Answers
 Summary
Chapter 2. What is NoSQL?
 Limitations of Relational Databases
 Limitations of Relational Databases (Con’t)
 Defining NoSQL
 What are NoSQL (Not Only SQL) Databases?
 The Past and Present of the NoSQL World
 NoSQL Database Properties
 NoSQL Benefits
 NoSQL Database Storage Types
 The CAP Theorem
 Mechanisms to Guarantee a Single CAP Property
 Limitations of NoSQL Databases
 Big Data Sharding
 Sharding Example
 Quiz
 Quiz Answers
 Summary
Chapter 3. NoSQL Systems Overview
 MongoDB
 MongoDB Features (Cont’d)
 MongoDB Operational Intelligence
 MongoDB Use Cases
 Amazon S3
 Amazon Storage SLAs
 Amazon Glacier
 Amazon S3 Security
 Data Lifecycle Management with Amazon S3
 Amazon S3 Cost Monitoring
 OpenStack
 Object Store (Swift)
 Components of Swift
 Google BigTable
 BigTablebased Applications
 BigTable Design
 Google Cloud Storage
 Hadoop
 Hadoop Clusters
 Hadoop’s Core Components
 Hadoop Distributed File System
 Accessing HDFS
 Communication inside HDFS
 HBase
 HBase Design
 HBase Design
 MemcacheDB
 Using MemcacheDB instead of memcached
 Apache Cassandra
 Apache Cassandra Design
 Cassandra’s Main Features and Qualities of Service
 Summary
Chapter 4. Big Data Business Intelligence and Analytics
 Traditional Business Intelligence and Analytics
 OLAP Tasks
 Data Mining Tasks
 Big Data / NoSQL Solutions
 NoSQL Data Querying and Processing
 MapReduce Defined
 MapReduce Explained
 Example of Map & Reduce Operations using JavaScript
 Hadoop
 Hadoopbased Systems for Data Analysis
 Hadoop’s MapReduce
 Hadoop’s Streaming MapReduce
 Streaming Use Cases
 Setting up Java Classpath for Streaming Support
 Making things simpler with Hadoop Pig Latin
 Pig Latin Script Example
 SQL Equivalent
 Amazon Elastic MapReduce
 Big Data with Google App Engine (GAE)
 GAE Dashboard
 Example of Google AppEngine Java Datastore API
 MongoDB Data Model
 MongoDB Query Language (QL)
 The
 find
 and
 findOne
 Methods
 The
 find
 and
 findOne
 Methods
 A MongoDB QL Example
 What is Hive?
 Hive Architecture
 Interfacing with Hive
 Hive Data Definition Language
 Business Analytics with Hive
 The UnQL Specification
 Quiz
 Quiz Answers
 Summary
Chapter 5. Applied Data Science
 What is Data Science?
 Data Science Ecosystem
 Data Mining vs. Data Science
 Business Analytics vs. Data Science
 Who is a Data Scientist?
 Data Science Skill Sets Venn Diagram
 Data Scientists at Work
 Examples of Data Science Projects
 An Example of a Data Product
 Applied Data Science at Google
 Data Science Gotchas
 Summary
Chapter 6. Data Analytics Lifecycle Phases
 Big Data Analytics Pipeline
 Data Discovery Phase
 Data Harvesting Phase
 Data Priming Phase
 Model Planning Phase
 Model Building Phase
 Communicating the Results
 Production Rollout
 Summary
Chapter 7. Getting Started with R
 Introduction
 Positioning of R in the Data Science Arena
 R Integrated Development Environments
 Running R
 Ending the Current R Session
 Getting Help
 Getting System Information
 General Notes on R Commands and Statements
 R Data Structures
 R Objects and Workspace
 Assignment Operators
 Assignment Example
 Arithmetic Operators
 Logical Operators
 System Date and Time
 Operations
 Userdefined Functions
 Userdefined Function Example
 R Code Example
 Type Conversion (Coercion)
 Control Statements
 Conditional Execution
 Repetitive Execution
 Repetitive execution
 Builtin Functions
 Reading Data from Files into Vectors
 Example of Reading Data from a File
 Writing Data to a File
 Example of Writing Data to a File
 Logical Vectors
 Character Vectors
 Matrix Data Structure
 Creating Matrices
 Working with Data Frames
 Matrices vs Data Frames
 A Data Frame Sample
 Accessing Data Cells
 Getting Info About a Data Frame
 Selecting Columns in Data Frames
 Selecting Rows in Data Frames
 Getting a Subset of a Data Frame
 Sorting (ordering) Data in Data Frames by Attribute(s)
 Applying Functions to Matrices and Data Frames
 Using the apply() Function
 Example of Using apply()
 Executing External R commands
 Listing Objects in Workspace
 Removing Objects in Workspace
 Saving Your Workspace
 Loading Your Workspace
 Getting and Setting the Working Directory
 Getting the List of Files in a Directory
 Diverting Output to a File
 Batch (Unattended) Processing
 Importing Data into R
 Exporting Data from R
 Standard R Packages
 Extending R
 CRAN Page
 Summary
Chapter 8. R Statistical Computing Features
 Statistical Computing Features
 Descriptive Statistics
 Basic Statistical Functions
 Examples of Using Basic Statistical Functions
 Nonuniformity of a Probability Distribution
 Writing Your Own skew and kurtosis Functions
 Generating Normally Distributed Random Numbers
 Generating Uniformly Distributed Random Numbers
 Using the summary() Function
 Math Functions Used in Data Analysis
 Examples of Using Math Functions
 Correlations
 Correlation Example
 Testing Correlation Coefficient for Significance
 The cor.test() Function
 The cor.test() Example
 Regression Analysis
 Types of Regression
 Simple Linear Regression Model
 LeastSquares Method (LSM)
 LSM Assumptions
 Fitting Linear Regression Models in R
 Example of Using lm()
 Confidence Intervals for Model Parameters
 Example of Using lm() with a Data Frame
 Regression Models in Excel
 Multiple Regression Analysis
 Finding the BestFitting Regression Model
 Comparing Regression Models
 Summary
Chapter 9. Data Science Algorithms and Analytical Methods
 Supervised vs Unsupervised Machine Learning
 Supervised Machine Learning Algorithms
 Unsupervised Machine Learning Algorithms
 Choose the Right Algorithm
 Lifecycles of Machine Learning Development
 Classifying with kNearest Neighbors (SL)
 kNearest Neighbors Algorithm
 kNearest Neighbors Algorithm
 The Error Rate
 Decision Trees (SL)
 Decision Tree Terminology
 Decision Trees in Pictures
 Decision Tree Classification in Context of Information Theory
 Information Entropy Defined
 The Shannon Entropy Formula
 The Simplified Decision Tree Algorithm
 Using Decision Trees
 Naive Bayes Classifier (SL)
 Naive Bayesian Probabilistic Model in a Nutshell
 Bayes Formula
 Classification of Documents with Naive Bayes
 Unsupervised Learning Type: Clustering
 KMeans Clustering (UL)
 KMeans Clustering in a Nutshell
 Regression Analysis
 Simple Linear Regression Model
 Linear vs NonLinear Regression
 Linear Regression Illustration
 Major Underlying Assumptions for Regression Analysis
 LeastSquares Method (LSM)
 Locally Weighted Linear Regression
 Regression Models in Excel
 Multiple Regression Analysis
 Regression vs Classification
 TimeSeries Analysis
 Decomposing TimeSeries
 MonteCarlo Simulation (Method)
 Who Uses MonteCarlo Simulation?
 MonteCarlo Simulation in a Nutshell
 MonteCarlo Simulation Example
 MonteCarlo Simulation Example
 Summary
Chapter 10. Visualizing and Reporting Processed Results
 Data Visualization
 Data Visualization in R
 The ggplot2 Data Visualization Package
 Creating Bar Plots in R
 Creating Horizontal Bar Plots
 Using barplot() with Matrices
 Using barplot() with Matrices Example
 Customizing Plots
 Histograms in R
 Building Histograms with hist()
 Example of using hist()
 Pie Charts in R
 Examples of using pie()
 Generic XY Plotting
 Examples of the plot() function
 Dot Plots in R
 Saving Your Work
 Supported Export Options
 Plots in RStudio
 Saving a Plot as an Image
 The BIRT Project
 Visualization with D3 JavaScript Library
 Examples of D3 Visualization
 JavaFX
 Data Visualization with JavaFX
 Summary
Chapter 11. Apache Mahout
 What is Apache Mahout?
 Main Use Cases
 Supported Algorithms in Classification
 Supported Algorithms in Clustering
 The Stable Set of Algorithms
 Running Mahout on Amazon
 Summary
Chapter 12. Machine Learning with BigML
 What is BigML?
 How BigML Service Works
 Data Files
 Data Sets
 Data Sets Example
 Models
 Predictions
 The Prediction UI Form
 Text Analysis in BigML
 REST API
 Summary
Chapter 13. The Semantic Web
 Defining the Term “Semantic”
 Metadata in HTML Pages
 Defining the Semantic Web
 The Original Web Proposal
 W3C and the Semantic Web
 The Semantic Web as Web 3.0
 The Semantic Web Stack
 Ontology and OWL
 The Smart Data Continuum
 Resource Description Framework
 RDF Model
 An RDF Example
 SPARQL
 SPARQL Example
 Microformat
 Example of the hCard Microformat
 Summary
