The idea of data lake is to have a single store of all data in the enterprise ranging from raw data (which implies exact copy of source system data) to transformed data which is used for various forms including Reporting, Visualization, Analytics and Machine learning.

The data lake includes structured data from relational databases (rows and columns), semi-structured data (csv, logs, xml, and newer formats like json), unstructured data (emails, documents, pdf's) and even binary data namely images, audio and video, thus creating a centralized data store accommodating all forms of data



« Data lake »


A quote saved on June 30, 2016.

#unstructured-data


Top related keywords - double-click to view: