BUSINESS INSIGHTS

Nov 16, 2017

Microsoft Azure & Databricks = Cloud-Scale Spark Power

Colby Ford Posted by Colby Ford

azure databricks.png

Recently, Microsoft and Databricks made an exciting announcement around their partnership that will soon result in a cloud-based, managed Spark service on Azure. Currently, some select customers are allowed into a "private preview" mode of the service, and over the next few weeks, a "gated public preview" will ensue for around 150 clients. In January 2018, the service will be available for everyone to try. While the full details are not known about the partnership or full features of the platform, here is how Azure Databricks will likely enhance your Big Data capabilities in the cloud.

+

Azurelgoo.png

What is Databricks?

Databricks is a company that was started by the team that originally created Spark at UC Berkeley. They have created a Unified Analytics Platform that aims to be the single system for everything from analytics workflows to Spark integration to security.

Databricks boasts various benefits of their Unified Analytics Platform such as:

  • UNIFY ANALYTICS WITH APACHE SPARK - Eliminate the need for disparate tools.
  • STREAMLINE ANALYTIC WORKFLOWSReduce deployment time to minutes.
  • INCREASE PRODUCTIVITY OF DATA SCIENCE TEAMSWith Databricks, they’ll be 5x more productive.
  • REDUCE RISKEnable innovation with out-of-the-box enterprise security and compliance.

[Source]

Apache® Spark™ on Databricks is said to have a 5x performance gain over that of the open-source version.

Looking at the Databricks' Feature Comparison page, there are quite a few features that could likely make it into the Azure version in the near future.

CLOUD OPTIMIZATION:

  • Tuned Apache® Spark™ clusters
  • High availability for Spark Streaming
  • Built-in file system

COST MANAGEMENT:

  • Autoscaling Apache Spark clusters
  • Multi-user cluster sharing

BUILT-IN EXPLORATION TOOLS:

  • Notebooks with real-time collaboration + revision history
  • Publish notebooks as production dashboards

BUILT-IN PRODUCTION TOOLS:

  • Spark job monitoring alerts
  • One-click deployment from notebooks to Spark Jobs
  • APIs to build workflows in notebooks

SECURITY:

  • Access control for clusters and notebooks
  • Permission-based job and workflow execution
  • Authenticated SQL server

Expect a Familiar Azure Experience

Azure already has a managed Hadoop™ offering known as HDInsight. You can spin up a custom HDInsight cluster with your specifications from the Portal. Then, you pay for the time that you have your cluster running. Support for HDInsight is provided by the Microsoft Azure support team.

As for Azure Databricks, the experience will be very similar. Simply spin up an Azure Databricks cluster directly from the Portal and Azure will do the setup work for you. No licensing is required other than your Azure subscription.

For support, Microsoft and Databricks will have a seamless system for users to get help with their individual needs. Since the service is within Azure, you will go through Microsoft for support, which will now be fully integrated with the Databricks expert support team.

Want to learn more about how you can take advantage of this exciting announcement at your organization? Contact BlueGranite!

Colby Ford

About The Author

Colby Ford

Colby is a Data Scientist at BlueGranite. Coming from a background in mathematics, statistics, and computational biology, he combines this expertise to bring Data Science to everyone. He utilizes R and Python and puts Machine Learning to work to gain insight from data. Outside of BlueGranite, Colby is an avid pianist and genomics researcher. Check out Colby’s website at www.colbyford.com.

Latest Posts