The Myth of the Magical Data Lake

The Myth of the Magical Data Lake

Since their inception, IT organizations have fought valiantly to “put an end to data silos.” While the battle cry was universally accepted, the tactics of achieving this feat have varied. About 10 years ago, the term Data Lake was coined to describe a unified repository for all corporate digital data. The idea naturally sounds perfect to anyone looking at implementing machine learning, AI, or big data analytics. As with many great ideas, the challenge is in the execution. After all, if all of your corporate data lives in a single place, don’t you have everything you need to naturally integrate your systems? (Hint, no.)

According to Wikipedia (the indisputable single source of truth on the internet) a Data Lake can range between data dumped in its natural, unprocessed form, to data that has been somewhat categorized and structured. While it might sound like a great idea, consider what if one source speaks Polish and another French, while one talks about fashion and the other discusses auto mechanics? Yes, you have a bunch of information, all in the same place, but suddenly it is all out of context!

Enter the Data Warehouse. For the sake of the analogy, we are now going to take all of the content, translate it into Latin (who doesn’t love Latin?), sterilize it, and neatly file it away into a data store to run analytics out of. Surely now we have all of the necessary components for cross functional integrations to break down those nasty data silos that were the original problem. Well… no. Data Warehouses aren’t designed to be all-inclusive, particularly of unstructured data, and the information still loses context through the sterilization and transfer process.

So, the latest term floating around, with a highly negative connotation, is the Data Swamp. Information Age warns that without proper data hygiene, governance, or control of the “unstructured data content” your data lake will become contaminated by essentially unusable data that infinitely propagates without benefit to the company. 

I object to the word “Swamp” being used in such a negative manner, as one of my favorite local parks is a beautiful and flourishing wetland that is anything but dead, decaying, or useless. The Conestee Preserve is a beautiful ecosystem where streams meander in and out of various marshlands forming a flourishing habitat for beavers, turtles, deer, and The National Audubon Society has designated the park as an “Important Bird Area of Global Significance.” 13 miles of trails and boardwalks ensure that even humans can explore the preserve without impacting the flow of this beautiful marshy environment.


The “Natural Ecosystem” analogy best describes the Aras integration philosophy.  Every company has an IT Ecosystem. How healthy is yours? Rather than attempting to force your data to perform “unnatural acts,” our open and transparent philosophy enables data to flow unencumbered where you need it. Learn more about Aras’ Integration capabilities.