Data Lakes in a

eBOOK | June 2018

The ability to capture and analyze practically any type of data has emerged as a critical business capability.

Traditional data warehousing and analytical systems can be complex and slow to adapt. Introducing a data lake to modernize your data architecture can be an effective way to continue leveraging existing investments, begin collecting new types of valuable data, and ultimately obtain insights faster. Leaders are looking for proven techniques to deliver accurate information timely and cost-effectively. While a data lake is not the answer to everything, it can bring extreme value to the business if implemented effectively for the right use cases.

Implementing a modern data architecture

What does it mean to implement a modern data architecture?

Like many other technology initiatives, it really depends on the implementation objectives. The following characteristics are most commonly associated with a modern data architecture:

  • Data originating from internal systems, cloud-based systems, as well as external data provided from partners and third parties
  • Acquisition of data via near real-time data streams in addition to batch loads
  • Delivery of analytics to traditional platforms such as data marts and semantic layers, as well as specialty databases  such as graphing or mapping 
  • Analytics use cases ranging from operational and corporate BI to advanced analytics and data science
  • Support for the needs of all types of users, ranging from casual consumers to data analysts to data scientists

Business Needs Driving Data Architectures to Evolve & Adapt

Data Holds the Key

Today’s business leaders understand that data holds the key to making educated and supportable decisions. Traditional data warehousing and business intelligence approaches have been challenged as being too slow to respond. Reducing the time to value is a primary objective of a modern data architecture.

With all the media hype around data lakes and big data, it can be difficult to understand how — and even if — a data lake solution makes sense for your analytics needs. Some people believe that implementing a data lake means throwing away their investment in a data warehouse. This perception ends up either sending them down the wrong path or causes them to sideline big data and data lakes as a future project.

we believe that a data lake does not replace a company’s existing investment in its data warehouse.

The good news?

At BlueGranite, we believe that a data lake does not replace a company’s existing investment in its data warehouse. In fact, they complement each other very nicely. With a modern data architecture, organizations can continue to leverage their existing investments, begin collecting data they have been ignoring or discarding, and ultimately enable analysts to obtain insights faster.

Principles of a Modern Data Architecture

Big data technologies, such as a data lake, support and enhance modern analytics but they do not necessarily replace traditional systems

Data Lakes in a Modern Data Architecture - Diagrams-1

Multi-Platform Architectures Have Become the Norm

Within a modern data architecture, any type of data can be acquired and stored. Some implementers elect to accumulate and centralize *all* data within a data lake. Though this “everything in the data lake” approach is architecturally simple and certainly may provide significant value, the trade-off is that relational data sources become “derelationalized” in the process. Conversely, a multi-platform architecture (depicted above) focuses on best fit engineering, which deems the most effective technology to be based on the data itself.

Data Integration and Data Virtualization are Both Prevalent

Many IT professionals have become less willing to take on data integration – that is, the requirement to physically move data before it can be used or analyzed. In reality, a lot of data integration still occurs, but it is more thoughtful and purposeful. Data virtualization and logical data warehouse tactics, such as federated queries across multiple data stores, are ways to “query data where it lives” without implementing a full-fledged data virtualization platform.

Data Analysis Capabilities are Flexible

A key tenet of the modern data architecture is that it is flexible. Having the ability to access the data very early in the data lifecycle, before it has been curated or refined for broad use, offers significant flexibility. Because of the challenges associated with analyzing raw data, analysis of data in place to determine its value (schema-on-read) is typically handled by a highly proficient data analyst or data scientist.

The Architecture is Constantly, Iteratively, Changing

Early exploration efforts to analyze data in the data lake impact the shape of solutions which are released for broader consumption. Raw data becomes progressively more refined as use cases are determined. Access to data becomes progressively less restricted as curated, user-friendly data structures are created. Sandbox or proof-of-concept solutions can become operationalized for broader consumption and/or improve existing solutions.

The Data Lake and the Data Warehouse Work in Tandem

As shown in the diagram above, both the data lake and the data warehouse are central players in the data storage area. Each are equally important, with complementary roles to play.

Considerations for a Successful Data Lake in the Cloud

Cloud-based services, such as Microsoft Azure, have become the most common choice for new data lake deployments. Cloud service providers allow organizations to avoid the cost and hassle of managing an on-premises data center by moving storage, compute, and networking to hosted solutions.


The following are some specific considerations when planning a data lake deployment on a cloud service:

  • Type of storage: A data lake is a conceptual data architecture, and not a specific technology. The technical implementation can vary, which means different types of storage can be utilized, which translates into varying features.
  • Data integration and data virtualization are both prevalent: Organizations sought after “one version of the truth.” One data lake which contains all organizational data is sometimes a goal, particularly if an objective is integration of older, legacy systems with newer types of data. In reality, many Organizations end up with multiple data lakes or document stores either unintentionally or intentionally (to segregate business units, for instance).
  • Security capabilities: Different technology platforms implement security differently. A service such as Azure Data Lake Store implements hierarchical security based on access control lists, whereas Azure Blob Storage implements key-based security. These types of capabilities are continually evolving in the cloud, so be sure to verify on a frequent basis.
  • Data cataloging, metadata, and tagging: A data catalog is a key data discovery component for authors of self-service analytical solutions. A well-designed data catalog acts not only as a data dictionary, but also assists with data previews, data profiling, and data access. Clear documentation, metadata, tagging, and data classification capabilities are vital for users to be able to effectively use data in a data lake.
  • Data warehouse integration: For most companies, a data lake vs. a data warehouse is not an either/or decision. Rather, a blended approach is typically most effective. Most commonly we see two types of data lake integration with the data warehouse:
    • Processes which pick up and move the data from the data lake to the data warehouse (or vice versa).
    • The ability to issue federated queries which returns data from both your data lake and your data warehouse via a single query.

Tips for getting started with a data lake

  • Confirm a data lake really is the best choice
  • Start with a small, practical project
  • Address ‘readiness’ considerations
  • Use a POC to reduce risk in technology selection
  • Don’t shortchange planning
  • Implement the right level of discipline

We hope you have found this preview of the eBook useful. If you are looking to bring in new approaches, combined with proven techniques, to support decision making at all levels of your organization, download the full PDF version today!