Multi-Platform Architectures Have Become the Norm
Within a modern data architecture, any type of data can be acquired and stored. Some implementers elect to accumulate and centralize *all* data within a data lake. Though this “everything in the data lake” approach is architecturally simple and certainly may provide significant value, the trade-off is that relational data sources become “derelationalized” in the process. Conversely, a multi-platform architecture (depicted above) focuses on best fit engineering, which deems the most effective technology to be based on the data itself.
Data Integration and Data Virtualization are Both Prevalent
Many IT professionals have become less willing to take on data integration – that is, the requirement to physically move data before it can be used or analyzed. In reality, a lot of data integration still occurs, but it is more thoughtful and purposeful. Data virtualization and logical data warehouse tactics, such as federated queries across multiple data stores, are ways to “query data where it lives” without implementing a full-fledged data virtualization platform.
Data Analysis Capabilities are Flexible
A key tenet of the modern data architecture is that it is flexible. Having the ability to access the data very early in the data lifecycle, before it has been curated or refined for broad use, offers significant flexibility. Because of the challenges associated with analyzing raw data, analysis of data in place to determine its value (schema-on-read) is typically handled by a highly proficient data analyst or data scientist.
The Architecture is Constantly, Iteratively, Changing
Early exploration efforts to analyze data in the data lake impact the shape of solutions which are released for broader consumption. Raw data becomes progressively more refined as use cases are determined. Access to data becomes progressively less restricted as curated, user-friendly data structures are created. Sandbox or proof-of-concept solutions can become operationalized for broader consumption and/or improve existing solutions.
The Data Lake and the Data Warehouse Work in Tandem
As shown in the diagram above, both the data lake and the data warehouse are central players in the data storage area. Each are equally important, with complementary roles to play.