This lab provides an introduction to Apache Spark and creating a Spark cluster with Azure HDInsight. HDInsight is a key analytics component in the Cortana Intelligence Suite, and Spark on HDInsight enhances a traditional Hadoop cluster with in-memory processing and other capabilities. It offers convenient scaling, data processing, and querying capabilities that can be leveraged directly or by other technologies in Cortana Intelligence.
In this lab, you will create a Spark cluster using HDInsight. You will discover how to upload files to Azure blob storage using a command line utility, then analyze the data in a Jupyter notebook using a combination of PySpark and SQL.
Download the lab documentation and you’ll learn about:
BlueGranite developed this material in conjuction with Microsoft, and the lab and its contents are property of Microsoft. Microsoft holds no legal obligations on quality or performance of the lab material. Fill out the form below to get access to the step-by-step lab document. Click here for additional hands-on labs.