BIG DATA/HADOOP

Big Data draws a lot of attention nowadays. The decreasing cost of disk storage has made it practical to retain data which would have been thrown away only a few years ago. This development, along with new data processing techniques and the increasing compute power readily available to organizations, provide the opportunity to find insights and hidden relationships in structured and un-structured content alike. Equipped with this newly acquired knowledge, organizations are much better positioned to make informed tactical and strategic decisions critical in today’s very competitive world.

HortonWorks Hadoop

We are delighted to announce our new training partnership with Hortonworks University. As their Authorised Training Partner in Ireland we are now offering various Hortonworks Hadoop onsite and offsite private training courses. Some examples of available courses are:

HDP OVERVIEW: APACHE HADOOP ESSENTIALS – GTHDP15 – 1 day


This course provides a technical overview of Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course provides an optional primer for those who plan to attend a hands-on, instructor-led course. (There are no hands-on labs for this course).

 Course Objectives

  • Describe what makes data “Big Data”
  • List data types stored and analyzed in Hadoop
  • Describe how Big Data and Hadoop fit into your current infrastructure and environment
  • Describe fundamentals of:
  • the Hadoop Distributed File System (HDFS)
  • YARN
  • MapReduce
  • Hadoop frameworks: (Pig, Hive, HCatalog, Storm, Solr, Spark, HBase, Oozie, Ambari, ZooKeeper, Sqoop, Flume, and Falcon)
  • Recognize use cases for Hadoop
  • Describe the business value of Hadoop
  • Describe new technologies like Tez and the Knox Gateway

Certification

Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit

hortonworks.com/training/certification for more information.

Hortonworks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions.

Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information…

HDP OPERATIONS: INSTALL AND MANAGE WITH APACHE AMBARI – GTHDP01 – 4 days

This course is designed for administrators who will be managing the Hortonworks Data Platform (HDP) 2.2. It covers installation, configuration, maintenance, security and performance topics.

Course Objectives

  • Describe various tools and frameworks in the Hadoop 2.x ecosystem
  • Understand support for various types of cluster deployments
  • Understand storage, network, processing, and memory needs for a Hadoop cluster
  • Understand provisioning and post deployment requirements
  • Describe Ambari Stacks, Views, and Blueprints
  • Install and configure an HDP 2.2 cluster using Ambari
  • Understand the Hadoop Distributed File System (HDFS)
  • Describe how files are written to and stored in HDFS
  • Explain Heterogeneous Storage support for HDFS
  • Use HDFS commands
  • Perform a file system check using command line
  • Mount HDFS to a local file system using the NFS Gateway
  • Understand and configure YARN on a cluster
  • Configure and troubleshoot MapReduce jobs
  • Understand how to utilize Capacity Scheduler
  • Utilize cgroup and node labeling
  • Understand how Slider, Kafka, Storm and Spark run on YARN
  • Use WebHDFS to access HDFS over HTTP
  • Understand how to optimize and configure Hive
  • Use Sqoop to transfer data between Hadoop and a relational database

 

Format

50% Lecture/Discussion
50% Hands-on Labs

Certification

Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

 

HortonWorks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information…

HDP OPERATIONS: MIGRATING TO THE HORTONWORKS DATA PLATFORM – GTHDP08 – 2 days


This course is designed for administrators who are familiar with administering other Hadoop distributions and are migrating to the Hortonworks Data Platform (HDP). It covers installation, configuration, maintenance, security and performance topics.

Course Objectives

  • Install and configure an HDP 2.x cluster
  • Use Ambari to monitor and manage a cluster
  • Mount HDFS to a local filesystem using the NFS Gateway
  • Configure Hive for Tez
  • Use Ambari to configure the schedulers of the ResourceManager
  • Commission and decommission worker nodes using Ambari
  • Use Falcon to define and process data pipelines
  • Take snapshots using the HDFS snapshot feature
  • Implement and configure NameNode HA using Ambari
  • Secure an HDP cluster using Ambari
  • Setup a Knox gateway

 

Format

50% Lecture/Discussion
50% Hands-on Labs

 

Certification

Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

 

Hortonworks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information…

HDP OPERATIONS: HADOOP ADMINISTATION 1- GTHDP10 – 4 days


This course is designed for administrators who will be managing the Hortonworks Data Platform (HDP) 2.3 with Ambari. It covers installation, configuration, and other typical cluster maintenance tasks.

Course Objectives

  • Summarize and enterprise environment including Big Data
  • Hadoop and the Hortonworks Data Platform (HDP)
  • Install HDP
  • Manage Ambari Users and Groups
  • Manage Hadoop Services
  • Use HDFS Storage
  • Manage HDFS Storage
  • Configure HDFS Storage
  • Configure HDFS Transparent Data Encryption
  • Configure the YARN Resource Manager
  • Submit YARN Jobs
  • Configure the YARN Capacity Scheduler
  • Add and Remove Cluster Nodes
  • Configure HDFS and YARN Rack Awareness
  • Configure HDFS and YARN High Availability
  • Monitor a Cluster
  • Protect a Cluster with Backups

Format

60% Lecture/Discussion

40% Hands-on Labs

Certification

Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

Hortonworks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information….

HDP OPERATIONS: HADOOP ADMINISTATION 2- GTHDP11 – 3 days


This course is designed for experienced administrators who manage Hortonworks Data Platform (HDP) 2.3 clusters with Ambari. It covers upgrades, configuration, application management, and other common tasks.

Course Objectives

  • Execute automated installation of and upgrades to HDP clusters
  • Configure HDFS for NFS integration and centralized caching
  • Control application behavior using node labels
  • Deploy applications using Slider
  • Understand how to configure HDP for optimum Hive performance
  • Understand how to manage HDP data compression
  • Integrate Ambari with an existing LDAP environments to manage users and groups
  • Configure high availability for Hive and Oozie
  • Ingest SQL tables and log files into HDFS
  • Support scalable and automated HDP application best practices
  • Configure automated HDP data replication

Format

60% Lecture/Discussion, 40% Hands-on Labs

Certification

Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

Hortonworks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions.  Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information…

HDP OPERATIONS: HORTONWORKS DATA FLOW – GTHDP13 – 3 days


This course is designed for ‘Data Stewards’ or ‘Data Flow Managers’ who are looking forward to automate the flow of data between systems. Topics Include Introduction to NiFi, Installing and Configuring NiFi, Detail explanation of NiFi User Interface, Explanation of its components and Elements associated with each. How to Build a dataflow, NiFi Expression Language, Understanding NiFi Clustering, Data Provenance, Security around NiFi, Monitoring Tools and HDF Best practices.

Course Objectives

  • Describe HDF, Apache NiFi and its use cases.
  • Describe NiFi Architecture
  • Understand Nifi Features and Characteristics.
  • Understand System requirements to run Nifi.
  • Understand Installing and Configuring NiFi
  • Understand NiFi user interface in depth.
  • Understand how to build a DataFlow using NiFi
  • Understand Processor and its Elements
  • Understand Connection and its Elements
  • Understand Processor Group and its elements
  • Understand Remote Processor Group and its Elements
  • Learn how to optimize a DataFlow
  • Learn how to use NiFi Expression language and its use.
  • Learn about Attributes and Templates in NiFi
  • Understand Concepts of NiFi Cluster
  • Explain Data Provenance in NiFi
  • Learn how to Secure NiFi
  • Learn How to effectively Monitor NiFi
  • Learn about HDF Best Practices

Format
50% Lecture/Discussion
50% Hands-on Labs

Certification
Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

Hortonworks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information…

HDP OPERATIONS: SECURITY- GTHDP14 – 3 days


This course is designed for experienced administrators who will be implementing secure Hadoop clusters using authentication, authorization, auditing and data protection strategies and tools.

Course Objectives

  • Describe the 5 pillars of a secure environment
  • List the reasons why a secure environment is needed
  • Describe how security is integrated within Hadoop
  • Choose which security tool is best for specific use cases
  • List security prerequisites
  • Configure Ambari security
  • Set up Ambari Views for controlled access
  • Describe Kerberos use and architecture
  • Install Kerberos
  • Configure Ambari for Kerberos
  • Configure Hadoop for Kerberos
  • Enable Kerberos
  • Install and configure Apache Knox
  • Install and configure Apache Ranger
  • Install and configure Ranger Key Management Services
  • Use Ranger to assure secure data access
  • Describe available partner security solutions

Format
50% Lecture/Discussion
50% Hands-on Labs

Certification
Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

Hortonworks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information…

HDP OPERATIONS: APACHE HBASE ADVANCED MANAGEMENT – GTHDP12 – 4 days


This course is designed for administrators who will be installing, configuring and managing HBase clusters. It covers installation with Ambari, configuration, security and troubleshooting HBase implementations. The course includes an end-of-course project in which students work together to design and implement an HBase schema.

Course Objectives
  • Hadoop Primer
    • Hadoop, Hortonworks, and Big Data
    • HDFS and YARN
  • Discussion: Running Applications in the Cloud
  • Apache HBase Overview
  • Provisioning the Cluster
  • Using the HBase Shell
  • Ingesting Data
  • Operational Management
  • Backup and Recovery
  • Security
  • Monitoring HBase and Diagnosing Problems
  • Maintenance
  • Troubleshooting

Format: 
50% Lecture/Discussion
50% Hands–on Labs

Certification
Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

Hortonworks University
Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on–site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry–leading hands–on labs that fully prepare students for real–world Hadoop scenarios.

More information…

HDP ANALYST: DATA SCIENCE – GTHDP04 – 4 days


This course is designed for students preparing to become familiar with the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, Pandas, SciPy, Scikit-learn), the Natural Language Toolkit (NLTK), and Spark MLlib.

Course Objectives

  • Recognize use cases for data science
  • Describe the architecture of Hadoop and YARN
  • Describe supervised and unsupervised learning differences
  • List the six machine learning tasks
  • Use Mahout to run a machine learning algorithm on Hadoop
  • Use Pig to transform and prepare data on Hadoop
  • Write a Python script
  • Use NumPy to analyze big data
  • Use the data structure classes in the pandas library
  • Write a Python script that invokes SciPy machine learning
  • Describe options for running Python code on a Hadoop cluster
  • Write a Pig User-Defined Function in Python
  • Use Pig streaming on Hadoop with a Python script
  • Write a Python script that invokes scikit-learn
  • Use the k-nearest neighbor algorithm to predict values
  • Run a machine learning algorithm on a distributed data set
  • Describe use cases for Natural Language Processing (NLP)
  • Perform sentence segmentation on a large body of text
  • Perform part-of-speech tagging
  • Use the Natural Language Toolkit (NLTK)
  • Describe the components of a Spark application
  • Write a Spark application in Python
  • Run machine learning algorithms using Spark MLlib

 

Certification
Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

 

Hortonworks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Courses are available for developers, data analysts and administrators. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information…

HDP ANALYST: APACHE HBASE ESSENTIALS – GTHDP09 – 2 days


This course is designed for big data analysts who want to use the HBase NoSQL database which runs on top of HDFS to provide real-time read/write access to sparse datasets. Topics include HBase architecture, services, installation and schema design.

Course Objectives

  • How HBase integrates with Hadoop and HDFS
  • Architectural components and core concepts of HBase
  • HBase functionality
  • Installing and configuring HBase
  • HBase schema design
  • Importing and exporting data
  • Backup and recovery
  • Monitoring and managing HBase
  • How HBase integrates with Apache ZooKeeper
  • HBase services and data operations
  • Optimizing HBase Access

 

Format

50% Lecture/Discussion
50% Hands-on Labs

Certification

Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

Hortonworks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information…

HDP DEVELOPER: APACHE PIG AND HIVE – GTHDP02 – 4 days


This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition and using Pig and Hive to perform data analytics on Big Data. Labs are executed on a 7-node HDP cluster.

Course Objectives

  • Describe Hadoop, YARN and use cases for Hadoop
  • Describe Hadoop ecosystem tools and frameworks
  • Describe the HDFS architecture
  • Use the Hadoop client to input data into HDFS
  • Transfer data between Hadoop and a relational database
  • Explain YARN and MaoReduce architectures
  • Run a MapReduce job on YARN
  • Use Pig to explore and transform data in HDFS
  • Use Hive to explore Understand how Hive tables are defined and implementedand analyze data sets
  • Use the new Hive windowing functions
  • Explain and use the various Hive file formats
  • Create and populate a Hive table that uses ORC file formats
  • Use Hive to run SQL-like queries to perform data analysis
  • Use Hive to join datasets using a variety of techniques, including Map-side joins and Sort-Merge-Bucket joins
  • Write efficient Hive queries
  • Create ngrams and context ngrams using Hive
  • Perform data analytics like quantiles and page rank on Big Data using the DataFu Pig library
  • Explain the uses and purpose of HCatalog
  • Use HCatalog with Pig and Hive
  • Define a workflow using Oozie
  • Schedule a recurring workflow using the Oozie Coordinator

 

Format
50% Lecture/Discussion
50% Hands-on Labs

Certification
Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

Hortonworks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information…

HDP DEVELOPER: JAVA – GTHDP03 – 4 days


This advanced course provides Java programmers a deep-dive into Hadoop application development. Students will learn how to design and develop efficient and effective MapReduce applications for Hadoop using the Hortonworks Data Platform, including how to implement combiners, partitioners, secondary sorts, custom input and output formats, joining large datasets, unit testing, and developing UDFs for Pig and Hive. Labs are run on a 7-node HDP 2.1 cluster running in a virtual machine that students can keep for use after the training.

Course Objectives

  • Describe Hadoop 2 and the Hadoop Distributed File System
  • Describe the YARN framework
  • Develop and run a Java MapReduce application on YARN
  • Use combiners and in-map aggregation
  • Write a custom partitioner to avoid data skew on reducers
  • Perform a secondary sort
  • Recognize use cases for built-in input and output formats
  • Write a custom MapReduce input and output format
  • Optimize a MapReduce job
  • Configure MapReduce to optimize mappers and reducers
  • Develop a custom RawComparator class
  • Distribute files as LocalResources
  • Describe and perform join techniques in Hadoop
  • Perform unit tests using the UnitMR API
  • Describe the basic architecture of HBase
  • Write an HBase MapReduce application
  • List use cases for Pig and Hive
  • Write a simple Pig script to explore and transform big data
  • Write a Pig UDF (User-Defined Function) in Java
  • Write a Hive UDF in Java
  • Use JobControl class to create a MapReduce workflow
  • Use Oozie to define and schedule workflows

 

Format
50% Lecture/Discussion
50% Hands-on Labs

Certification
Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

Hortonworks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information…

HDP DEVELOPER: WINDOWS – GTHDP05 – 4 days


This course is designed for developers who create applications and analyze Big Data in Apache Hadoop on Windows using Pig and Hive. Topics include: Hadoop, YARN, the Hadoop Distributed File System (HDFS), MapReduce, Sqoop and the HiveODBC Driver.

Course Objectives

  • Describe Hadoop and Hadoop and YARN
  • Describe the Hadoop ecosystem
  • List Components & deployment options for HDP on Windows
  • Describe the HDFS architecture
  • Use the Hadoop client to input data into HDFS
  • Transfer data between Hadoop and Microsoft SQL Server
  • Describe the MapReduce and YARN architecture
  • Run a MapReduce job on YARN
  • Write a Pig script
  • Define advanced Pig relations
  • Use Pig to apply structure to unstructured Big Data
  • Invoke a Pig User-Defined Function
  • Use Pig to organize and analyze Big Data
  • Describe how Hive tables are defined and implemented
  • Use Hive windowing functions
  • Define and use Hive file formats
  • Create Hive tables that use the ORC file format
  • Use Hive to run SQL-like queries to perform data analysis
  • Use Hive to join datasets
  • Create ngrams and context ngrams using Hive
  • Perform data analytics
  • Use HCatalog with Pig and Hive
  • Install and configure HiveODBC Driver for Windows
  • Import data from Hadoop into Microsoft Excel
  • Define a workflow using Oozie

 

Format

50% Lecture/Discussion
50% Hands-on Labs

 

Certification

Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

 

Hortonworks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information…

HDP DEVELOPER: CUSTOM YARN APPLICATIONS – GTHDP06 – 2 days


This course is designed for developers who want to create custom YARN applications for Apache Hadoop. It will include: the YARN architecture, YARN development steps, writing a YARN client and ApplicationMaster, and launching Containers. The course uses Eclipse and Gradle connected remotely to a 7-node HDP cluster running in a virtual machine.

Course Objectives

• Describe the YARN architecture
• Describe the YARN application lifecycle
• Write a YARN client application
• Run a YARN application on a Hadoop cluster
• Monitor the status of a running YARN application
• View the aggregated logs of a YARN application
• Configure a ContainerLaunchContext
• Use a LocalResource to share application files across a cluster
• Write a YARN ApplicationMaster
• Describe the differences between synchronous and
asynchronous ApplicationMasters
• Allocate Containers in a cluster
• Launch Containers on NodeManagers
• Write a custom Container to perform specific business logic
• Explain the job schedulers of the ResourceManager
• Define queues for the Capacity Scheduler

Format

50% Lecture/Discussion
50% Hands-­on Labs

Certification

Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

Hortonworks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information…

HDP DEVELOPER: STORM AND TRIDENT FUNDAMENTALS – GTHDP07 – 2 days


This course provides a technical introduction to the fundamentals of Apache Storm and Trident that includes the concepts, terminology, architecture, installation, operation, and management of Storm and Trident. Simple Storm and Trident code excerpts are provided throughout the course. The course also includes an introduction to, and code samples for, Apache Kafka. Apache Kafka is a messaging system that is commonly used in concert with Storm and Trident.

 

Format

Self-paced, online exploration or
Instructor led exploration and discussion

 

Certification

Hortonworks offers a comprehensive certification program that identifies you as an expert in Apache Hadoop. Visit hortonworks.com/training/certification for more information.

HortonWorks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop scenarios.

More information…

INTRODUCTION TO BIG DATA AND NOSQL – GTBD1 – 1 day

 

We live in the information age where business success is grounded on the ability of organizations to convert raw data coming from various sources into high-grade business information.

Many organizations are overwhelmed by the sheer volume of information they have to process in order to stay competitive. Traditional database systems may become either prohibitively expensive to handle the exponential growth of data volumes or found unsuitable for the job. At this point, the data gets mystically morphed into the Big Data.

This course provides an introduction to Big Data as well as NoSQL (Not Only SQL) database systems. The fundamental concepts of and ideas behind Big Data / NoSQL technologies are methodically explored and many buzzwords demystified. The course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the subject.

More information…

HADOOP FOR SYSTEM ADMINISTRATORS – GTBD2 – 3 days

This course covers the essentials of deploying and managing an Apache™ Hadoop® cluster. The course is lab intensive with each participant creating their own Hadoop cluster using either the CDH (Cloudera’s Distribution, including Apache Hadoop) or Hortonworks Data Platform stacks. Core Hadoop services are explored in depth with emphasis on troubleshooting and recovering from common cluster failures. The fundamentals of related services such as Ambari, Zookeeper, Pig, Hive, HBase, Sqoop, Flume, and Oozie are also covered. The course is approximately 60% lecture and 40% labs.

Supported Distributions:

Red Hat Enterprise Linux 6

More information…

HADOOP PROGRAMMING – GTBD3 – 4 days

Success of many organizations depends on their ability to derive business insights from massive amount of raw data coming from various sources.

Apache Hadoop is a proven production-ready platform for large-scale data processing that meets most demanding technical and business requirements.

This intensive training course provides theoretical and technical aspects of programming using Hadoop-centric systems such as Pig, Hive, Sqoop, Impala and HBase. The course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.

Topics

  • Hadoop Ecosystem Overview
  • MapReduce
  • Pig Scripting Platform
  • Apache Hive
  • Apache Sqoop
  • Cloudera Impala
  • Apache HBase

More information…

BIG DATA FOUNDATION – GTBD4 – 2 days

This is a foundation level course designed to provide you with an understanding of Big Data, the potential sources of Big Data that can be used for solving real business problems and also provide an overview of Data Mining and the tools used in it.

This is a fundamental course with practical exercises designed to provide you with some degree of hands-on experience in using two of the most popular technologies in Big Data processing – Hadoop and MongoDB. You will get the opportunity to practice installing these two technologies through our Work-Labs. The course exposes you to real-life Big Data technologies with the purpose of obtaining results from real datasets from Twitter.

After completing the course, you will be equipped not only with fundamental Big Data knowledge, but will also be introduced to a working development environment containing Hadoop and MongoDB, installed by yourself. This practical knowledge can be used as a starting point in the organizational Big Data journey.

Learning Objectives:

Individuals certified at this level will have demonstrated their understanding of:

  • Big Data fundamentals
  • Big Data technologies
  • Big Data governance
  • Available Sources of Big Data
  • Data Mining, its concepts and some of the tools used for Data Mining
  • Hadoop, including its concepts, how to install and configure it, the concepts behind MapReduce, and how Hadoop can be used in real life scenarios
  • MongoDB, including its concepts, how to install and configure it, the concepts behind document databases and how MongoDB can be used in real life scenarios

Benefits of taking this course:

Participants in this course will obtain the following benefits:

  • Detailed understanding of Big Data and Data Mining concepts.
  • Ability to identify and obtain relevant datasets when looking at a business problem.
  • Ability to install and manage Big Data processing environments based on Hadoop or MongoDB at a departmental level.

Examination

Exam Format: closed-book format. Paper-Based. Participants may bring paper-based dictionaries. No electronic devices are permitted.

Questions: 40 multiple choice questions

Passing Score: 65%

Exam Duration: 60 minutes. 15 minutes extra time for non-native English speakers

Proctoring: Web proctoring

Accreditation: Cloud Credential Council

 

More information…

APPLIED DATA SCIENCE AND BIG DATA ANALYTICS BOOT CAMP FOR BUSINESS ANALYSTS – GTBD5 – 3 Days

Business success in the information age is predicated on the ability of organizations to convert massive amount of raw data coming from various sources into high-grade business information.Many organizations are overwhelmed by the sheer volume of information they have to process in order to stay competitive.  Traditional database systems may become either prohibitively expensive to handle the exponential growth of data volumes or found unsuitable for the job.  Data Science and Big Data Analytics represent an emerging discipline that helps get a handle on the situation and capitalize on the wealth of information assets within your organization.

This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.  The course covers the fundamental and advanced concepts and methods of deriving business insights from Big Data.  The course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.

Topics

  • NoSQL and Big Data Systems Overview
  • Big Data Business Intelligence and Analytics
  • Applied Data Science and Business Analytics
  • Algorithms, Techniques and Common Analytical Methods
  • Machine Learning
  • Visualizing and Reporting Processed Results
  • Data Analysis with R

More information…

SECURING HADOOP WITH KERBEROS GTBD6 – 4 Days

Course Description

This course covers Kerberos concepts, components, installation, configuration, and troubleshooting. Realms are tested by Kerberizing NFS and SSH services. The core components of Hadoop (HDFS, and Mapreduce) are reviewed with emphasis on the security model, and a simple Hadoop cluster is installed and configured. The cluster is then integrated with the Kerberos realm and configured to run in secure mode.

Objectives:

  • Understand Kerberos operation.
  • Install and configure a Kerberos realm.
  • Secure local SSH and NFS services.
  • Understand core Hadoop services.
  • Install and configure a Hadoop cluster.
  • Configure a cluster to operate in secure mode.

Supported Distributions:

Red Hat Enterprise Linux 6

Course Prerequisites

Course assumes familiarity with Linux and core system administration skills. Qualified participants should be comfortable working from the shell, editing files, managing local services, and using SSH. Familiarity with Hadoop is beneficial, but not absolutely essential.

More information..

APPLIED DATA SCIENCE AND BIG DATA ANALYTICS – GTBD7 – 4 days

Course Description
This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.  The course covers the fundamental and advanced concepts and methods of deriving business insights from big” and/or “small” data.  This training course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.TOPICS

  • Applied Data Science and Business Analytics
  • Algorithms, Techniques and Common Analytical Methods
  • Machine Learning Introduction
  • Visualizing and Reporting Processed Results
  • The R Programming Language
  • Data Analysis with R
  • Elements of Functional Programming
  • Apache Spark Introduction
  • Spark SQL
  • ETL with Spark
  • MLlib Machine Learning Library
  • Graph Processing with GraphX

Audience

Data Scientists, Software Developers, IT Architects, and Technical Managers

Prerequisite

Participants should have the general knowledge of statistics and programming

Duration

4 Days

BIG DATA TRAINING: DATA SCIENCE FOR SOLUTION ARCHITECTS – GTBD8 – 4 days

Course Description

This training course helps Solution Architects and other IT practitioners understand the value proposition, methodology and techniques of the emerging discipline of Data Science.  The class also introduces the students to a number of existing production-ready technologies and capabilities that enable enterprises to build cost-efficient Big Data processing solutions.

Objectives

This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics. The course covers the fundamental and advanced concepts and methods of deriving business insights from raw data using cost-effective data processing solutions. The course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.

Topics

  • Applied Data Science and Business Analytics
  • Algorithms, Techniques and Common Analytical Methods
  • NoSQL and Big Data Systems Overview
  • MapReduce
  • Big Data Business Intelligence and Analytics
  • Visualizing and Reporting Processed Results
  • Data Analysis with R
  • Hadoop Programming Ecosystem

Target Audience

Enterprise Architects, Solution Architects, Information Technology Architects, Business Analysts, Senior Developers, and Team Leads

Course Prerequisites

Participants should have the general knowledge of statistics and programming.

More Information..

BIG DATA AND ANALYTICS FOR BUSINESS USERS – GTBD9 – 1 day

Course Description

Data is one of the most valuable assets that your organization possesses.  Every day you are creating more data and potentially passing up opportunities to harvest that data and use it to accelerate the achievement of your organization’s strategic objectives.  Big Data and Analytics represent an emerging trend around harvesting, analyzing, and capitalizing on the wealth of data that is within the grasp of your enterprise.

This one day primer introduces Cloud Computing, Big Data, and the emerging discipline of Data Analytics.  Attention will be given to the three V’s of Big Data: Volume, Velocity, and Variety as well as the fourth V of Value.  You’ll learn about these critical elements and the powerful value proposition that these capabilities provide.  What are the processes, tools, and personnel that will be needed in order to take advantage of this sea change in information management?  This essential course will equip you to understand your customers better and how to deliver more value today.

Topics

  • Cloud Computing Basics
  • Introduction to Big Data
  • Understanding Data Analytics
  • Understanding Predictive Analytics
  • Basics of Analytical Modeling
  • Unpacking the Value, Volume, Velocity, and Variety
  • Organizational Considerations
  • Recommended Next Steps

Target Audience

Managers, Analysts, Architects, and Team Leads

Prerequisites

There are no prerequisites for this course. If you have any queries or are in doubt about your suitability, please contact us and we will assist you.

More Information..