BUSINESS INSIGHTS

May 03, 2018

Azure SQL Data Warehouse and Azure Databricks: Now Even Better Together

Mike Cornell Posted by Mike Cornell

Microsoft recently announced the general availability of its Compute Optimized Gen2 tier for Azure SQL Data Warehouse (Azure SQL DW). This new tier brings with it more compute, concurrency, and availability for the cloud data warehousing service.

Azure SQL DW 1

The Azure Databricks team also recently released the Azure SQL Data Warehouse connector (SQL DW connector) for Apache Spark. This connector enables even deeper integration between Azure SQL DW and the unified Apache Spark platform. These improvements in the two Azure services make them even better together for modern big data and AI platforms in Azure.

Azure SQL DW Compute Optimized Gen2

The announcement of the Gen2 tier for Azure SQL DW came with three major highlights:

  1. More processing power: Gen2 offers as much as a 5X performance improvement over Gen1 by caching more frequently used data closer to the compute resources. This is done without sacrificing the separation of compute and storage, which is a key differentiator for Azure SQL DW allowing the service to be “paused” and “resumed” as needed.
  2. More concurrent users: Gen1 provides just over 30 concurrent connections to the database. This means that for organizations with many BI and reporting users, the number of active connections must be strictly monitored and mitigated. With Gen2, 128 concurrent users are supported. This 4X improvement takes a ton of pressure off administering active sessions.
  3. More availability: With the announcement of Gen2, Azure SQL DW is now available in 33 different Azure regions, making it the most widely available cloud data warehouse platform to date.

Azure SQL DW Connector for Apache Spark

The Azure SQL DW connector for Apache Spark allows services like Azure Databricks to interact much more effectively with Azure SQL. With this new connector, Azure Databricks can both query massive amounts of data from and load massive amounts of data to Azure SQL DW using PolyBase. After loading data, it can also fire off additional processing in Azure SQL DW directly from Azure Databricks. This functionality allows for end-to-end, secure, big data ETL processing scenarios using Apache Spark in Azure Databricks to load data into Azure SQL DW.

Better Together

With the recent updates to Azure SQL DW and Azure Databricks, these two services are even better together in a modern big data analytics and AI platform than they previously were. More seamless and efficient integration and the ability to operate at previously unavailable compute scales make for even more solid combined-service use cases. Below are just a few of the use cases for using Azure Databricks and Azure SQL DW together.

Batch or Streaming ETL

Use Azure Databricks for processing batch and streaming data before loading it into Azure SQL DW for further processing and analysis.

Azure SQL DW 2

Reference Data Lookup

Use Azure Databricks to mashup data from Azure SQL DW with other data sources.

Azure SQL DW 3

Machine Learning

Train machine learning models in Azure Databricks and send predictions into Azure SQL DW for further processing and analysis.

Azure SQL DW 4

If you have any questions about the recent announcements for the Azure SQL DW Compute Optimized Gen2 tier, about Azure Databricks, or how these services can help you do more with your data, please reach out to BlueGranite today. In addition to our custom analytics solutions, our offerings include Azure SQL DW Training and a Databricks Workshop, each designed to maximize your organization’s data capabilities.

Want to learn more about Databricks’ vast potential? Read our recent review here.

Mike Cornell

About The Author

Mike Cornell

Mike Cornell is a Solution Architect at BlueGranite who is passionate about helping clients to solve business problems of varying size and complexity using data and analytics. Mike's specializations include big data platforms, cloud data platforms, advanced analytics, and data visualization and exploration. His technology interests include the Azure Data Platform, Hadoop Data Platform, Spark, R and Python for data analysis, Power BI, and SQL Server. Check out Mike's blog at http://www.datamic.net .

Latest Posts

New Call-to-action