Provision Managed Hadoop Clusters with Azure HDInsight

LP Arrow

Please note: this lab content is in the process of being updated and may contain out of date information. If you have any questions, please do not hesitate to contact us

This lab provides an introduction to processing data with Apache Hadoop in Azure using an HDInsight cluster. HDInsight is a key analytics component in the Cortana Intelligence Suite that allows users to perform distributed computing across a Hadoop cluster as a service. It offers convenient scaling, data processing, and querying capabilities that can be leveraged directly or by other technologies in Cortana Intelligence.

In this lab, you will create a Hadoop cluster using the HDInsight service. You will explore how to manage files in an Azure storage container using the Storage Explorer application, then process a sample data file using Hive. In addition to connecting directly to the cluster and using a query editor in Ambari to run Hive queries, you will also explore how to connect to your cluster with PowerShell to execute a local Hive script.

Download the lab documentation and you’ll learn about:

  • Creating an HDInsight cluster using the Azure Portal
  • Uploading files to Azure blob storage using Azure Storage Explorer
  • Connecting to Ambari to manage your cluster
  • Creating a Hive database and table
  • Querying a Hive table that contains data originally from a CSV
  • Storing data in ORC format
  • Installing Azure PowerShell and connecting to an HDInsight cluster
  • Processing a Hive job using Azure PowerShell
  • Deleting an HDInsight cluster using Azure PowerShell or the Azure Portal

BlueGranite developed this material in conjuction with Microsoft, and the lab and its contents are property of Microsoft. Microsoft holds no legal obligations on quality or performance of the lab material.   Fill out the form below to get access to the step-by-step lab document.  Click here for additional hands-on labs.

Download our azure HDInsight LAB