Overview of Features for Data & Analytics
SQL Server 2016 is a sizeable release with some compelling reasons to consider upgrading in order to bring additional value to your business intelligence, analytics, or data warehousing environment. In this recap, we highlight not just ‘What’ each item is, but also ‘Why’ it is relevant and useful.
The individual capabilities discussed below are categorized into the following four areas:
1. Analytics Enablement
2. Performance Improvements
3. Security Enhancements
4. System Management Efficiences
It’s no secret that the ability to effectively analyze data – and take appropriate action – can be a major factor in competitive ability, operational efficiency, and cost savings. Database platforms and analytical toolsets are evolving to cope with vastly larger data volumes, across an array of data formats, delivered in an increasingly short window of time. Following are a few new features for delivery of analytical solutions.
What? SQL Server R Services, which includes the Advanced Analytics Extensions and Revolution R components, provides an enterprise platform for usage of the widely adopted R programming language for statistical analysis, data science, and data visualization. R Services allows a data scientist to push the computations to be performed where the data resides, an improvement over pulling the data locally. In-memory components within the SQL Server engine are utilized for performance and scalability when executing R algorithms. T-SQL stored procedures can be invoked to return R predictions, scores, forecasts, plots, etc. Familiar tools like SQL Server Management Studio (SSMS) and SQL Server Data Tools (SSDT) can be used to validate, operationalize, and manage solutions created with R. Additionally, security can be managed in the same manner as other SQL Server artifacts.
Why? Leveraging R solutions within an existing SQL Server environment (on-premises if desired) opens up enormous possibilities for analysis of the past, present, and future. Data scientists and system administrators can collaborate with the data + analytical engine + analytics library all centralized in SQL Server. R Services is intended for enterprise level deployments, thus improving upon open-source R related to memory limits, parallelism of computations, and data movement requirements.
What? With more companies taking advantage of Hadoop frameworks, access to distributed datasets can represent a challenge for those unfamiliar with Hadoop clusters, Linux, MapReduce, and Java. PolyBase eases this pain point by utilizing the familiarity of the T-SQL language to access data stored in SQL Server, Hadoop (Hortonworks or Cloudera), and Azure Blob Storage via a single T-SQL query – effectively joining structured and semi-structured data together in the result set. In addition to ad hoc querying capabilities, PolyBase can also be used for data import and export operations.
Why? Employees’ existing knowledge of T-SQL can be leveraged for interacting with semi-structured data residing outside of SQL Server, which reduces the learning curve associated with Hadoop implementations. The ability to use T-SQL also opens up the use of other familiar tools such as SQL Server Management Studio, SQL Server Data Tools, or your favorite BI tool such as Power BI or Reporting Services. The flexibility to marry up relational and non-relational data in an ad hoc manner facilitates agility, which in turn offers time to learn and explore the data before deciding which portions may benefit from additional data integration processes.
What? Consistent with modern web standards, SQL Server Reporting Services (SSRS) 2016 introduces a newly redesigned portal. The SSRS portal has been promoted to become the central point of access for on-premises delivery of various report types, which is a clear move away from using SharePoint for deployment of BI assets. The SSRS portal will support 3 of the 4 report types: Paginated Reports (created with SSRS), Mobile Reports and KPIs (produced via the Mobile Report Publisher, formerly Datazen), and Interactive Reports (generated in Power BI Desktop). The Power BI portal will support all 4 report types, including Analytical Reports and Charts (created from Excel). A single mobile application will support consumption of all 4 report types. Numerous other enhancements, such as flexible parameter design and easier printing setup, are also included in the 2016 release.
Why? The new SSRS portal based on HTML5 will allow for more seamless delivery across various browsers and mobile devices. For organizations who wish to deploy reports on-premises, the SSRS web portal will be the central point for delivery. For companies who are prepared to use a cloud service, the Power BI Service is another alternative for hybrid delivery. The existence of the 4 new report choices are intended to be thought of similarly to tools like Excel, Word, and PowerPoint: select the best tool based on what it is you are trying to achieve. The overall intent is to decouple the decision on report type from where it’s going to be deployed.
What? Operational analytics, introduced in SQL Server 2016, offers the ability to run real-time operational analytics directly on a source transactional system. Combining workloads on a single server is possible because the analytics workload will execute against a ‘columnstore’ copy of the data, while the operational OLTP workload continues to run against the original ‘rowstore’ tables. Though the intended impact on the source system is intended to be minimal, we recommend using this functionality judiciously, thoroughly testing the impact of comingling workloads, and using a readable secondary server if available.
Why? Running analytical workloads on an OLTP system can be beneficial when data is needed from a source SQL Server to satisfy real-time operational reporting needs. This may be useful functionality if the desire is to avoid introducing processes to relocate the data on another server, the data is needed near real-time, and the reporting requirements can be satisfied without integrating it with other data first. This functionality does not typically eliminate the need for a fully integrated data warehouse with historical data; rather, its purpose is to focus on what is happening right now.
What? SQL Server 2016 will initially support basic importing, exporting, parsing, and querying of JSON documents utilizing T-SQL extensions. When imported into SQL Server, JSON-structured data will utilize standard textual data types (varchar and nvarchar), which is intended to reduce complexity and improve compatibility.
With the challenge of increasing data volumes and evolving architectures, performance remains one of the most challenging and essential aspects of any system.
What? The new Query Store saves queries, plans, and runtime statistics over time so performance can be monitored. It can be thought of as a flight recorder for the database which provides database administrators a built-in tool to identify what changed, observe query plan variations, identify most popular queries, locate outliers, and view resources such as CPU, I/O, and memory. Four built-in dashboards are delivered with the Query Store, each of which can be customized.
Why? The Query Store will provide historical metrics for insight on usage patterns and pinpointing problematic queries without reliance on what’s still present in the plan cache, or when SQL Server was last restarted, or what baselines the system administrator has previously gathered. Traditionally performance tuning has been considered as much art as science, so the Query Store promises to reduce the learning curve in the area of performance monitoring and tuning.
What? SQL Server 2016 brings a wealth of changes for SSAS Tabular models. For models which utilize the in-memory engine, the new parallel processing capability will speed up data refresh operations. For models which operate in DirectQuery mode, there are numerous enhancements including query generation improvements, MDX support, new data sources, as well as the ability to create calculated columns and security in DirectQuery. Also noteworthy are DAX improvements (known as SuperDAX) which process certain calculations far more efficiently and quickly. The underlying structure of Tabular models is now JSON-based, which will bring parity with the Power BI APIs, speed up metadata changes, and allow for developer efficiencies with scripting operations via a new Tabular Model Scripting Language.
Why? When we choose to introduce a semantic layer like Analysis Services into an environment, speed is an important factor: when querying the model, when processing the model, and when designing the model. This release has improvements in each of these areas.
What? The new row-level security (RLS) functionality permits users to view only the rows of data they are approved to view (for instance, salespersons who can view only their own customers). Because RLS is implemented in the database layer, it can be taken advantage of by other downstream systems in certain cases.
Why? To date, introducing row-level security usually involved multiple layers: customizing a solution in the database for ‘power users’ who are permitted to query relational data directly, and/or implementing roles in an Analysis Services semantic layer, and/or implementing security in a front-end application. The RLS functionality has potential to reduce the need for some types of customized solutions in the database layer.
What? Always Encrypted is technology to encrypt certain columns which contain sensitive data, such as Social Security numbers and credit card numbers, while the data is stored in SQL Server (at rest) as well as when the data is being transmitted (in motion). The requisite ADO.NET driver handles the essential role of encrypting the data for specified columns before it is stored in SQL Server, as well as the decryption by approved applications. Note that certain columns are not good candidates for encryption, such as columns utilized in joins, ordering, or grouping operations.
Why? This encryption technique is different because the data is encrypted before it gets into SQL Server which provides a barrier between those who manage the data from those who are permitted to view it. Put another way, system administrators, consultants, and vendors don’t necessarily need to view the data in order to perform their role effectively.
What? Dynamic Data Masking obfuscates sensitive data as it is being presented to the user. For example, personally identifiable data such as an employee birthdate could be masked as 01/01/2000. This differs from encryption (discussed just above) in that the underlying data in SQL Server is preserved, meaning that the data masking is handled at query time. Data masking is best suited to certain types of data such as salary, birthdate, e-mail addresses; it should not be used as the only security solution for extremely sensitive data such as credit card numbers.
Why? Data masking can be useful in a great many situations. For instance if you are working with a vendor, or perhaps certain types of personnel need access to some data but not in its entirety. It can also be utilized to avoid having personally identifiable data flowing into development and test environments which represents a security threat. Contrary to encryption, the underlying data can indeed be viewed by system administrators or privileged users via queries on the database, making data masking a lighter weight method than encryption for safeguarding of data.
What? Temporal tables are designed to track history of changes to data over time. Recording of the historical rows when a change occurs, as well as the start/end date, is handled entirely by the system via two tables behind the scenes: a current data table and its companion historical data table. Note that temporal tables do not entirely replace the need for type 2 historical dimension tracking in an integrated data warehouse or analytical system.
Why? System versioning of changes can be extremely useful for determining data values at a particular point in time which is relevant to auditing, legal, and compliance purposes, among others. Having this functionality built in to SQL Server reduces the need to design custom processes. It may also reduce the need for generating point-in-time backups of data.
What? Stretch Database is a hybrid feature that ‘stretches’ data from on-premises SQL Server tables into an Azure SQL Database for archival purposes. Data which is seldom accessed (often called cold data) can be stored in Azure which frees up resources in an on-premises data center. There are no changes in how user queries are constructed: the system will transparently combine the results from local data and remote data during query processing.
Why? Archival strategies are important for a number of reasons, including performance, maintenance, service level agreements, auditing, and legal purposes. Stretch Database can be a cost-effective solution for organizations with requirements to retain historical data for extended lengths of time. Since the cold data is always online and available to users when they need it, that makes Stretch Database a practical alternative for data archival.
What? Master Data Services (MDS) has been enhanced for performance in both the web-based interface as well as the Excel add-in. Additionally, there are improvements related to management of security and administrator accounts.
Why? Master Data Services is an extremely useful tool for managing reference data and hierarchies not found in another source system – consistent master data can significantly enhance the value of reporting and analytics. Since the ideal usage involves data stewards who are from the relevant business functional areas, improvements to performance can only help the chance of a master data system to be adopted and be considered successful.
What? Integration Services in SQL Server 2016 re-introduces the ability to deploy one modified package at a time, rather than requiring the entire project to be deployed each time. This change will permit the BI team to take advantage of all of features the project deployment model offers, and still retain the flexibility of promoting individual SSIS packages as they change.
Why? Although this may initially appear to be a trivial improvement, many BI developers do not have a traditional custom development background which means the concept of full project deployments is unfamiliar. The most common workarounds have been techniques utilizing source control processes and branches. From a system maintenance and change control perspective, the re-introduction of incremental package deployments will be a refreshing improvement for many SSIS developers who work in a multi-developer environment.
What? The interface to install SQL Server 2016 includes options to configure the database server based on best practices. Specifically, this includes aligning the number of TempDB files to the number of logical processors, as well as building in certain settings which previously were controlled by the administrator via trace flags.
Why? Previously, to comply with certain standard best practices, the SQL Server configuration needed to be altered after the initial setup. These types of improvements to the product setup are particularly helpful for less experienced or ‘accidental’ database administrators. It reduces some of the risk that a new system will encounter performance issues.
What? The SSDT toolset has been unified into one tool, thus simplifying the process of getting a BI developer’s machine set up.
Why? Previously, figuring out which ‘flavor’ of SSDT was needed for various purposes was a cause of major confusion, resulting in time not well-spent.