What is a data lake?

A data lake keeps raw data handy in a repository until it's ready for use. Data warehouses are static and store data in files and folders, while a data lake is for more "fluid" use cases. Data Lakes are essential for any organization to effectively explore and scale answers to machine learning problems.

Data lakes store data in their purest form, catering to multiple stakeholders and can also be used to package data in a way that can be consumed by non-technical people unlike data warehouses.

Data Lake vs. Data Warehouse

Data warehouses are a straightforward solution where you generally collect data from a source with ETL and load it in. The you access those data sources with BI and analytics tools.

Data lakes ingest data by batch or stream where you can they choose whether or not to process or transform that data. A huge advantage of the data lake architecture is that you have access to the raw data which allows all kinds of users to process and transform data for their specific use cases.

Contact us

Data Lake

Complimentary to DW
Schema on Read
Fast Ingestion of New Data
Advanced Analytics + BI
Data at a low level of granularity and detail
Loosely defined SLA
Flexibility in tooling

Data Warehouse

Data Lake can be source for EDW
Schema on Write
Structured data only
Time consuming for new data
BI use cases
Data at summary/aggregated level
Tight SLA's
Limited flexibility in tooling (SQL)

Data Lake Features

Data lakes have many features distinguishing them from data warehouses and other types of data storage.
Insights generated instantly

Multiple views of your data. Get your well-structured data ready to be consumed to generate reports and KPI's or maintain in it metadata form connected with external data to be used by your research team and generate advanced analytics.

Flexibility of the types of data you can use

Expands the dataset for analysis beyond the traditional internal data held on ERP, CRM and supply chain management (SCM) systems.

Easily scales as your business grows

Full-featured data lake for the rapid ingestion, consolidation, cleansing, unification and sharing of all available internal and external data sources.

Full Data Governance

Monitors data and metadata using algorithms for quality and assurance.

Generate machine learning models quickly

Integrates both new and existing data sources to boost predictive models.

Real-time availability of all your data

Direct access to all unified customer data, via API, R6 dashboard and 3rd party systems (SalesForce, Odoo, etc.), for increased shareability across systems and teams.

Relativity6 builds world class machine learning models on top of world class data lakes.
Learn about our work in churn modeling and predictions for some of the most well known brands in financial services, retail and other industries.