I’ve recently devoted a lot of thought to the role structure plays in data analytics. Not just in the immediate sense that comes with being on a specific project, but in a more holistic “Why do we do what we do?” sense. Structure is a foundational element to much of what we, as designers and builders of analytical systems and solutions, do – but like anything that becomes a matter of course, it can be easy to lose sight of the “Why?” of it all.
Modern tools and technologies make it easier than ever to go and get data wherever it resides and begin gleaning insight from it – a fact which might require us to reevaluate some of the industry’s longstanding rules. If we don’t strictly need to face the often-difficult task of imposing structure on our data in order to analyze it, then what do we gain by doing so?
This question led me to re-examine the reasons we devote time and energy to structuring data for analytics in the first place. I soon realized that exploring those reasons is a daunting task – and fully unpacking them goes beyond a single blog entry. To fully delve into the broader relationship between data structure and analytics, I’ll be using this multipart series to take a closer look at some individual use cases I’ve encountered. While this first entry prefaces the larger subject, more specific exploration and examples will follow. My hope is that this upcoming series will shed some light on practices that, while now venerable by the fast-paced standards of today’s technology, are still integral to our efforts today.
Fundamentally, analytics is about achieving understanding. I’ve long been fascinated by etymology – or the origins of words. Studying a word’s roots can uncover concepts buried by modern context, conveying new insight. According to the Online Etymology Dictionary, “analytic” as an adjective comes to us modern English speakers mostly unchanged from both the Greek “analytikos” and the Medieval Latin “analyticus” – derived from its separate parts of “ana-“, meaning “up, back, throughout”, and “lysis”, meaning “a loosening or unfastening”. In other words, it is the creation and separation of an idea or concept from the actual thing or event we seek to understand, all for the sake of conveying greater meaning to our oft-fallible brains. It is directly related to the word and concept of “analogy”, or the use of related concepts and illustrations for the sake of understanding. This is at once both obvious and confounding – isn’t the best, most precise way to get to the truth of the matter to use as little translation as possible?
To address this, consider the sciences (bear with me here, I promise it will make sense in the end) – all of which are predicated on the precise structure of mathematics. In a way, we can see each science as a layer of abstraction, an analog, built atop the foundational concept that is mathematics. Math itself is based on shared conclusions concerning what we observe to be true. If I have an apple, and you give me an apple, then I now have two apples, and so on. Now, imagine trying to describe the migratory patterns of geese in purely mathematical terms – as only a series of numbers in black ink on a white page. It can be done, and the result would be extremely precise. However, it would not be a good way to impart general understanding of the phenomena, especially to an audience not already familiar with the subject matter.
In the case of migrating geese, we can better convey understanding in a visually oriented way. Instead of showcasing the lowest level of detail, the mathematical formula at the base of the migratory behavior, we can use maps and diagrams which layer on top of the other structured analogs, or the other levels of detail, to represent it in a way the human mind can easily grasp. The visual representation we show sits atop the field of biology in describing the geese’s behavior. This is underpinned by chemistry, which is a set of molecular and atomic reactions described by physics. Physics, in turn, is derived atop mathematical formulae. By using this series of logical structures, one built upon another, we can easily communicate a concept without bogging our audience down in thousands of pages of pure numbers. Enter modern data analytics and its inherent need for structure.
The Structure of Data
Source data is often organized for purposes other than conveying understanding. This can make it difficult to impose a structure like the one detailed above. The challenge usually lies in how the source data is written, stored, or generated. For example, highly normalized database topologies are put in place to facilitate the speed at which data is written, rather than to ensure that it can be readily read and understood. All the same, if a modern and powerful tool like Power BI can connect to a highly normalized transactional data structure, restructure and aggregate it behind the scenes, and still present that data graphically – why should we take additional time and effort to impose manual structure?
The answer lies in the fact that this automatic restructuring by a tool has limits, and that there are other capabilities that we gain, as well as missteps we avoid, when we deliberately impose structure on our data. These additional capabilities and avoidable pitfalls will be outlined on a case-by-case basis within this series. Stay tuned.