Using Apache Spark on Azure HDInsight

LP Arrow

Please note: this lab content is in the process of being updated and may contain out of date information. If you have any questions, please do not hesitate to contact us

This lab provides an introduction to Apache Spark and creating a Spark cluster with Azure HDInsight. HDInsight is a key analytics component in the Cortana Intelligence Suite, and Spark on HDInsight enhances a traditional Hadoop cluster with in-memory processing and other capabilities. It offers convenient scaling, data processing, and querying capabilities that can be leveraged directly or by other technologies in Cortana Intelligence.

In this lab, you will create a Spark cluster using HDInsight. You will discover how to upload files to Azure blob storage using a command line utility, then analyze the data in a Jupyter notebook using a combination of PySpark and SQL.

Download the lab documentation and you’ll learn about:

  • Creating an Apache Spark cluster using the Azure Portal
  • Uploading files to Azure blob storage using AzCopy
  • Creating a Jupyter notebook
  • Querying and transforming data in DataFrames and Tables
  • Storing data in a Hive table using Spark
  • Deleting a Spark cluster using the Azure Portal

BlueGranite developed this material in conjuction with Microsoft, and the lab and its contents are property of Microsoft. Microsoft holds no legal obligations on quality or performance of the lab material.   Fill out the form below to get access to the step-by-step lab document.  Click here for additional hands-on labs.

Download our SPARK with azure HDInsight LAB