Sep 19, 2017

Python – SQL Server 2017’s Hidden Swiss Army Tool

Steve Cardella Posted by Steve Cardella

When new versions of a software product are announced, there’s always a grab bag of new features. Some of those features turn out to be a godsend for users everywhere, but many of them are only useful for specific applications, or, worse, are worth little more than a bullet point on marketing materials. One of the new features of SQL Server 2017 is the addition of Python support to its Machine Learning Services. Is it a boon or a bust? Read on to find out.

Python SQL 2017.png

Earlier this year, Python joined R as a supported language in Microsoft’s Machine Learning Services module. Python support will be available on every edition of SQL Server 2017, including the free Express Edition. Guido van Rossum created the straightforward, easy-to-learn Python language in 1991. It has a wide variety of library packages to handle all sorts of tasks, including a significant number of actively maintained data science-specific libraries. In fact, it now rivals R in data science.

Like R, its intended use is in data science workloads. When embedded in SQL Server, Python scripts can directly use the SQL Server engine to access and store datasets. The increase in storage efficiency is a huge boon when working with large swaths of data. Data scientists already specializing in Python now need not learn R, with its relatively steeper learning curve. Like R integration, Python scripts can be embedded within scripts and stored procedures, giving an easy transition to production use of data models and the like.

However, Python was developed as a general-purpose language. It is no one-trick pony. An enterprising developer can use its features to encapsulate heavier logic or specialized features that previously required deploying a common language runtime (CLR) stored procedure. Migrating that functionality between servers or environments is simplified too, as it’s all contained within a SQL script. From data-cleansing functions to web service access, the world is your oyster. This is where Python really comes into its own. Here are a few examples to whet your appetite:

  • Email handling – You can ingest emails, perform sentiment analysis, store the contents in SQL Server, and even reply within the context of a stored procedure or SQL Agent job.
  • Real-time currency conversion – Retrieve real-time currency exchange rates from a webservice and use those rates to convert from the base currency.
  • Data Quality tools – Easily create text formatting, duplicate detection and fuzzy matching algorithms to improve data quality.
  • Use specialized hardware from within SQL Server – Feed data directly from SQL Server to GPUs for heavy-duty floating-point computation and efficiently store the results back in SQL Server. This can be especially useful in a data science context.

Now that Python is part of SQL Server 2017, it can be used in nearly any situation. On-premises, it can be run independently or embedded within SQL Server. In a hybrid environment, PolyBase or stretch databases can be added to the mix to store data both locally and in Azure. In the cloud, it can run against Hadoop or Spark clusters.

Now that SQL Server supports both R and Python, it begs the question, “Which is better?” There have been endless discussions about that very topic within the data science community. Proponents of R cite its development by statisticians and laser-focus on data analysis. It often can accomplish common analytical tasks with few lines of code. Critics of R suggest that it is harder to learn and less capable for data preparation tasks. Proponents of Python praise its ease in learning and debugging. Its general capabilities make it a flexible jack-of-all-trades, letting it handle all aspects of the task. Critics suggest that its general outlook makes achieving some goals less efficient.

The real answer to the question “which is better” is the language you’re comfortable with, that gets the job done. And, now that Python support is on par with R support in SQL Server, there’s no penalty or benefit in choosing one or the other.

Have a question about advanced analytics or want to learn more about how BlueGranite can help your organization? Contact us!

Predicting Customer Churn
Steve Cardella

About The Author

Steve Cardella

Steve is a Solution Consultant at BlueGranite. Certified in Microsoft SQL Server, he has over 10 years of experience in IT and programming. His areas of expertise include ETL, custom integrations, and reporting and data analysis services. Steve received his Bachelors of Science from Franciscan University and enjoys working with data surrounding retail, marketing, nonprofit, and manufacturing industries.