To stay competitive, organizations have started adopting new approaches to data processing and analysis. For example, data scientists are turning to Apache Spark for processing massive amounts of data using Apache Spark’s distributed compute capability and its built-in machine learning library.
This intensive Apache Spark training course provides an overview of data science algorithms as well as the theoretical and technical aspects of using the Apache Spark platform for Machine Learning. This training course is supplemented by a variety of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.
Our Machine Learning Course, Machine Learning with Apache Spark Training, covers following topics:
- Machine learning algorithms
- Introduction to functional programming
- Introduction to Apache Spark
- The Spark Shell
- The Spark Machine Learning Library
- Text mining
This Apache Spark training course has 3 hands-on labs that are outlined at the bottom of this page. The labs cover the spark-submit tool as well as Apache Spark shell. The labs allow you to practice the following skills:
Lab 1 - Using the spark-submit Tool
Spark offers developers two ways of running your applications:
- Using the spark-submit tool
- Using Spark Shell
In this lab, we will review what is involved in using the spark-submit tool.
Lab 2 - The Apache Spark Shell
Interactive development environment in Spark is provided by the Spark Shell (also known as REPL: Read/Eval/Print Loop tool) that is available for Scala and Python developers (Java is not yet supported).
The lab instructions below apply to the Scala version of the Spark Shell.
Lab 3 - Using Random Forests for Classification with Spark MLlib
In this lab, we will learn how to use Random Forests implementation of the algorithm from Spark's Machine Learning library, MLlib, to perform object classification.
Random Forests algorithm is regarded as one of the most successful supervised learning algorithm that can be used for both classification and regression.
In our work we will use the Python version of the library, which provides API similar to those implemented in Scala and Java.
We will also use the spark-submit Spark tool to submit the application from command line rather than typing in commands in Spark Shell.
Web Age Spark class can be delivered in traditional classroom style format. This Apache Spark Training can also be delivered in a synchronous instructor led format.