BUSINESS IMPACT

Feb 18, 2014

Bring Your Hadoop Implementation into the Cloud with Microsoft HDInsight

Josh Fennessy Posted by Josh Fennessy

When HDInsight was released for General Availability in October 2013, BlueGranite jumped on the opportunity to understand how the new product is best positioned in the Modern Data Warehouse landscape. Big Data with Hadoop technologies are an important part of the data analytics ecosystem and HDInsight with Windows Azure offers some intriguing benefits.describe the image

Are you considering a Hadoop implementation? Have you considered HDInsight? Here are a list of benefits that we’ve found HDInsight offers over a standard Hadoop on-premise implementation.

Simplified Implementation

A standard Hadoop implementation requires a large number of steps and series of complicated configuration items to complete before the Hadoop clusters are fully functional.  HDInsight implementations are typically functional in under 30 minutes and only require a series of simple configuration steps to complete.  This decrease in infrastructure configuration time results in solutions that are developed with a quicker time to market and with reduced capital.

Additionally, as usage of the Hadoop environment increases, scaling the environment out to include more processing power is very simple with Windows Azure and HDInsight. Clusters can include as few as 4 data nodes and as many as 20 data nodes. Implementations with more than 20 data nodes can also be accomplished upon special request. The overall number of data nodes is elastic and can be modified as your processing needs change throughout the month.

HDInsight includes the following Apache Hadoop modules by default, but additional modules can be added using standard deployment practices outlined by Microsoft and HortonWorks.

  • Apache Hadoop
  • Apache Pig
  • Apache Hive
  • Sqoop
  • Oozie
  • Ambari

This greatly simplified implementation process and flexibility of cluster size and configuration offers an attractive method of deploying Hadoop across the enterprise.

Encapsulated Security

HDInsight simplifies and augments the Hadoop security model in several important ways.  First, by limiting Remote Desktop Access to the head node of the cluster, the environment is more secure and less likely to be compromised by those wanting to gain access to the Windows environment.

Second, the use of Access Control Lists can be easily implemented to ensure that ANY network traffic is only allowed between specific and trusted IP address ranges.

Finally, certificate based authentication ensures proof of account ownership, and reduces the risk of spoofed logins.

Redundant Storage and Fault-tolerant Systems

HDInsight with Windows Azure is tightly coupled with Windows Azure Storage. This coupling ensures a high performing connection to the cloud based storage. Additionally, the cloud based storage offers redundant, geo-spatial replication, so data stored in Windows Azure Storage is highly available, fault tolerant, and also ready for Massively Parallel Processing (MPP) architecture.

The HDInsight cluster itself is built upon the Infrastructure-as-a-Service backbone of Windows Azure. These virtual machines that are created are flexible and portable. In the unlikely event of a virtual machine host becoming operational, those machines are instantly moved to another host either in the same data center, or in some cases a completely separate data center. This virtual machine migration is fast and automatic, with little to no impact on users’ ability to access the HDInsight environment.

Encapsulated Development Environments

In addition to the Windows Azure based implementations, HDInsight is also available as a single on-premise node emulator designed to be used for Development and Testing purposes.  This emulator is installed using the Web Platform Installer technology and installation is complete in as little as 20 minutes.  With the ability to use HDInsight in a local fashion, Big Data developers can easily create new solutions without impacting production systems, or requiring costly redundant environments in Windows Azure.

Nearly all of the features of HDInsight are available with the HDInsight Emulator, with the only differences being some slight variation in how PowerShell interacts with the environment. The changes to the interaction are minor and can easily be incorporated into a solution deployment scenario.

Gather Insights through Full Integration with Microsoft BI Tools

HDInsight seamlessly integrated with the full suite of Microsoft BI tools you are already familiar with, such as PowerPivot, Power View, Power Query and Power Map.  Now you can analyze both your structured and Hadoop data easily for greater insight.

Conclusion

HDInsight offers data professionals the ability to easily develop, deploy, and secure Hadoop solutions to many areas of the enterprise. With the ability to maintain an elastic and scalable environment, the benefits of HDInsight clearly offer a strong case for its usage and implementation.

Do you have questions about how HDInsight can be a part of your Hadoop implementation? Contact us today to find out!

AdvancedAnalyticsWorkshop
Josh Fennessy

About The Author

Josh Fennessy

Josh is a Solution Architect at BlueGranite. Josh is passionate about enabling information workers to become data leaders. His passions in the data space include: Modern Data Warehousing, unstructured analytics, distributed computing, and NoSQL database solutions.

Latest Posts

New Call-to-action