Search our courses
Training

This Advanced Analytics for Structured Data using AWS 2 day course provides a technical introduction to the understanding, creation and digital data supply chains for advanced analytics with AWS.

NB Note: This is an independent presentation, and is NOT an official Amazon Web Services Education Partner delivery.


What you'll learn

•    Navigate the AWS Console for key areas discussed in this class
•    Utilize AWS for data processing and data management
•    Describe patterns for handling structured data with AWS services
•    Understand the usage of AWS Elastic Map Reduce (EMR)
•    Understand the facilities provided by Elastic Map Reduce (EMR)
•    Identify the facilities provided by Apache Airflow for workflow
•    Outline the facilities provided by Glue (Data Catalog)
•    Describe the facilities provided by Aurora MySQL
•    Define the facilities provided by S3 – Simple Storage Service
•    Understand the facilities provided by Informatica Cloud (ICS)
•    Identify the features and functions of AWS Lambda
•    Describe the features of Hive, HiveQL, and the Hive CLI
•    Discuss file formats used in Advanced Analytics
•    Understand AWS Athena usages across varied data sources

Advanced Analytics for Structured Data using AWS

Price €1,185.00

Course Code

GTBD10

Duration

2 Day

Course Fee

€1,185.00

Accreditation

N/A

Target Audience

This is a general introduction course for anyone who wants a technical introduction to the understanding, creation and digital data supply chains for advanced analytics with AWS.

NB Note: This is an independent presentation, and is NOT an official Amazon Web Services Education Partner delivery.

Attendee Requirements

  • Basic understanding of a coding, AWS console, and cloud are helpful.

Expand all

Course Description

This Advanced Analytics for Structured Data using AWS 2 day course provides a technical introduction to the understanding, creation and digital data supply chains for advanced analytics with AWS.

NB Note: This is an independent presentation, and is NOT an official Amazon Web Services Education Partner delivery.


What you'll learn

•    Navigate the AWS Console for key areas discussed in this class
•    Utilize AWS for data processing and data management
•    Describe patterns for handling structured data with AWS services
•    Understand the usage of AWS Elastic Map Reduce (EMR)
•    Understand the facilities provided by Elastic Map Reduce (EMR)
•    Identify the facilities provided by Apache Airflow for workflow
•    Outline the facilities provided by Glue (Data Catalog)
•    Describe the facilities provided by Aurora MySQL
•    Define the facilities provided by S3 – Simple Storage Service
•    Understand the facilities provided by Informatica Cloud (ICS)
•    Identify the features and functions of AWS Lambda
•    Describe the features of Hive, HiveQL, and the Hive CLI
•    Discuss file formats used in Advanced Analytics
•    Understand AWS Athena usages across varied data sources

Course Outline

Chapter 1. Advanced Analytics with AWS

•        What are advanced analytics?
•        Introduction to AWS services for Analytics
•        AWS Public Data Sets
•        Forces and Trends in Cloud Analytics
•        Data Storage Platforms
•        Data Lifecycle and Events
•        What is JSON?

Chapter 2. Elastic MapReduce

•        What is Amazon EMR?
•        Getting started with EMR
•        EMR planning
•        Running Hadoop Applications for data processing
•        Hive and EMR
•        Spark and EMR
•        Kinesis and EMR
•        ETL with EMR
•        AWS CLI and EMR
•        AWS Console Walkthrough: EMR

Chapter 3. AWS GLUE

•        What is Glue?
•        How Glue works
•        AWS Glue Console
•        Getting started with Glue
•        Security management
•        Glue Data Catalog
•        Authoring with Glue
•        Auto-population and schema inference
•        Events and monitoring
•        Troubleshooting
•        ETL with Glue
•        Glue Application Programming Interface (API)
•        AWS Console Walkthrough: Glue

Chapter 4. Apache Airflow

•        What is Apache Airflow?
•        Introduction to Apache Airflow components
•        Visualizing DAG
•        Authoring DAGs
•        Performance Insights
•        Performance Graphs
•        Airflow Features
•        Use Cases
•        Workflow Tables Stakes
•        Incubation of Airflow
•        Airflow at Work

Chapter 5. Amazon Aurora

•        What is Amazon RDS?
•        Introduction to Aurora
•        MySQL and Aurora compatibility
•        Service-oriented Architecture and RDS
•        Data replication
•        Fully managed
•        Shared accountability
•        Data encryption at rest and in motion
•        Aurora as a meta store
•        AWS Console Walkthrough: Aurora

Chapter 6. Introduction to Informatica Cloud (ICS)

•        What is Informatica Cloud?
•        Integration Platform as a Service
•        Cloud-native migration and ICS
•        Use cases for Informatica Cloud
•        Cloud Connectors
•        ICS Connectors
•        Information Cloud Options
•        Citizen developers and ICS
•        Secure Agent
•        Cloud Integration Hub
•        ICS Console Walkthrough

Chapter 7. S3 – Simple Storage Service

•        What is S3?
•        Introduction to S3
•        Storage
•        Replication
•        CAP Theorem
•        Data Consistency
•        Buckets
•        Amazon Resource Name (ARN)
•        Resource Sharing
•        Versioning
•        Lifecycle
•        Security in S3
•        Use cases for S3
•        AWS Console Walkthrough: S3

Chapter 8. AWS Lambda

•        What is Lambda?
•        Introduction to Serverless Computing
•        What can you do with Lambda?
•        Lambda services
•        Triggering for digital data supply chain
•        Data processing with Lambda and Glue
•        Managed analytics pipeline with Lambda
•        AWS Console Walkthrough: Lambda

Chapter 9. HIVE

•        What is Hive?
•        Hive's value proposition
•        Hive's Main Sub-Systems
•        Hive Features
•        The "Classic" Hive Architecture
•        The New Hive Architecture
•        HiveQL
•        Where are the Hive tables located?
•        Hive Command-line Interface (CLI)
•        The Beeline Command Shell
•        Differences and considerations for Hive on Amazon EMR
•        Configuring an External Metastore for Hive
•        Use the Hive JDBC Driver
•        Hive release history
•        Hive Walkthrough

Chapter 10. HIVE CLI

•        Hive Command-line Interface (CLI)
•        The Hive Interactive Shell
•        Running Host OS Commands from the Hive Shell
•        Interfacing with HDFS from the Hive Shell
•        The Hive in Unattended Mode
•        The Hive CLI Integration with the OS Shell
•        Executing HiveQL Scripts
•        Comments in Hive Scripts
•        Variables and Properties in Hive CLI
•        Setting Properties in CLI
•        Example of Setting Properties in CLI
•        Hive Namespaces
•        Using the SET Command
•        Setting Properties in the Shell
•        Setting Properties for the New Shell Session
•        Setting Alternative Hive Execution Engines
•        The Beeline Shell
•        Connecting to the Hive Server in Beeline
•        Beeline Command Switches
•        Beeline Internal Commands

Chapter 11. HIVE DDL

•        Hive Data Definition Language
•        Creating Databases in Hive
•        Using Databases
•        Creating Tables in Hive
•        Supported Data Type Categories
•        Common Numeric Types
•        String and Date / Time Types
•        Miscellaneous Types
•        Example of the CREATE TABLE Statement
•        Working with Complex Types
•        Table Partitioning
•        Table Partitioning
•        Table Partitioning on Multiple Columns
•        Viewing Table Partitions
•        Row Format
•        Data Serializers / Deserializers
•        File Format Storage
•        File Compression
•        More on File Formats
•        The EXTERNAL DDL Parameter
•        Example of Using EXTERNAL
•        Creating an Empty Table
•        Dropping a Table
•        Table / Partition(s) Truncation
•        Alter Table/Partition/Column
•        Views
•        Create View Statement
•        Why Use Views?
•        Restricting Amount of Viewable Data
•        Examples of Restricting Amount of Viewable Data
•        Creating and Dropping Indexes
•        Describing Data

Chapter 13. HIVE DML

•        Hive Data Manipulation Language (DML)
•        Using the LOAD DATA statement
•        Example of Loading Data into a Hive Table
•        Loading Data with the INSERT Statement
•        Appending and Replacing Data with the INSERT Statement
•        Examples of Using the INSERT Statement
•        Multi Table Inserts
•        Multi Table Inserts Syntax
•        Multi Table Inserts Example

Chapter 14. Amazon Athena

•        What is Amazon Athena?
•        Athena in context
•        Athena Policy
•        Athena Data Sources
•        Connectivity
•        Getting started with Athena

Chapter 15. High Performance File System Formats

•        Why file systems for Advanced Analytics?
•        Columnar Data Storages
•        Introduction to ORC
•        Introduction to Parquet
•        Creating ORC and Parquet from CSV with Hive
•        Converting Text to ORC Data Format

Chapter 16. Introduction to Monitoring in AWS

•        Evolution of monitoring in AWS Cloud
•        What is Cloudwatch?
•        What is Cloudtrail?
•        What is AWS Config?
•        Event-driven models
•        Notifications driving events
•        Serverless computing
•        Introduction to Lamba

Lab Exercises

    Lab 1. Learning the AWS Management Console
    Lab 2. Managing Keys for Secure Connection
    Lab 3. Using S3 Through Management Console
    Lab 4. Managing IAM Users
    Lab 5. Getting Started with the EC2 Service
    Lab 6. Using AWS Lambda
    Lab 7. Using S3 and Aurora MySQL in AWS Lambda

 

NB Note: This is an independent presentation, and is NOT an official Amazon Web Services Education Partner delivery.

Learning Path
Ways to Attend
  • Attend a public course, if there is one available. Please check our schedule, or register your interest in joining a course in your area.
  • Private onsite Team training also available, please contact us to discuss. We can customise this course to suit your business requirements.

Technical ICT learning & mentoring services

Private Team Training

Our instructors are specialist consultants with vast real world experience and expertise allowing them to design and deliver client-focused courses for your organisation.

Learn more about our Private Team Training

What Our Clients Say

"Absolutely fantastic training. Thoroughly enjoyed it thanks to our highly enthusiastic tutor.  It wouldn't be an understatement to say that it was the best professional training that I have ever received."

 

Customised Linux with Networking

Live Online -  February 2022

 


“It was very positive. This course was 4 days but covered a semester worth of work if it was done in college. The labs were relevant and delegates were provided the lab/coursebook for further study and practice after the course finished. GuruTeam's course was excellent and provides a deeper understanding of the architecture and how it all works. The hands-on aspect was very helpful as it helped solidify the concepts as I went along."

 

Kubernetes Administration Certification - GTLFK

Live Online September 2024

 

 

 

“The Instructor was very knowledgeable, laid back and very approachable during the course. The environment setup was second to none.  Very easy to jump in and follow along with minimal pre-req setup."

Kubernetes Administration Certification - GTLFK

Onsite May 2024

 

“The experience was complete for me. I like how the training was sequenced - the slides organization, the examples and explanations and then the exercises. Time for exercises and support by the Instructor was great plus answering the questions and going out for answers and coming back with examples as brilliant. I loved how much I refreshed and how I learned and got inspired to improve stuff at work.”

 

Docker - GTDK1

Live Online December 2024

 

“Great instructor, who encouraged active participation. The breakout groups and exercises kept the group engaged and the content relevant to our own products”.

 

Site Reliability Engineering Foundation - GTDSRE

Live Online January 2022

 

 

 

"Intelligence is the ability to avoid doing work, yet
getting the work done"

Linus Torvalds, creator of Linux and GIT

Technical ICT learning & mentoring services

About GuruTeam

GuruTeam is a high-level ICT Learning, Mentoring and Consultancy services company. We specialise in delivering instructor-led on and off-site training in Blockchain, Linux, Cloud, Big Data, DevOps, Kubernetes, Agile, Software & Web Development technologies. View our Testimonials

Download our eBrochure
Our Accreditation Partners
  •  
  •  
  •  

 

Upcoming Courses

Kubernetes Admin

   28th Apr -1st May 2025
       3rd - 6th June 2025

 Live Online
 GMT +01:00  09:30 - 17:00 hrs

This Kubernetes Administration
Certification training course is
suitable for anyone who wants
to learn  the skills necessary to build and administer a Kubernetes cluster.

 

LEARN MORE

RUST PROGRAMMING INTRODUCTION

   6th - 9th May 2025

 Live Online
 GMT +01:00  09:30 - 17:00 hrs

This Rust Programming Introduction training course will help you understand what Rust applications look like, how to write Rust applications properly, and how to get the most out of the language and its libraries.
 

Learn More

NEW FAST TRACK PYTHON SERIES

April to June 2025

Live Online
GMT +01.00  09:30 - 17:00 hrs

Hit the ground running.....
GuruTeam’s new Python Fast Track series Instructor-led live online training
designed & delivered by industry experts.

Python Fast Track I Comprehensive
Python Fast Track II Expert
Python Fast Track III for Data Science

 

 

 

Learn More

GO PROGRAMMING INTRODUCTION

  19th - 22nd May 2025
   3rd - 6th June 2025

Live Online
GMT +01:00  09:30 - 17:00 hrs

This Go Programming Introduction training course will help you understand how Go works, and immediately be more productive. If you are building a team using Go, this will be a great opportunity to get your team on the same page and speaking the same language.

Learn More

Newsletter

Stay up to date, receive updates on scheduled dates, new courses, offers, and events.

Subscribe to our Newsletter