Member-only story
Creating a Data Lakehouse with Azure Synapse Analytics (Part 1 of 5)

1. Introduction
This is part one of a five-part series developed by the Data Analytics Team at Allgeier Schweiz. The original implementation was part of a workshop for the Swiss Data Science Conference of 2023.
You can also find the series in the Allgeier Schweiz GitHub repository.
The following series showcases how to create a Data Lakehouse using Microsoft Azure resources and connect the data to Microsoft Power BI for reporting.
- Part 1 will provide the readers with an overview of what a Data Lakehouse is, how it works, what the challenges are, and when it makes sense to implement this architecture.
- Part 2 will show the readers the preparations required to create the Azure resources for the Data Lakehouse.
- Part 3 will show the readers how to start Azure Synapse Analytics, import the Data Flow pipelines, and manually trigger these. These pipelines will transform the data, move them from each medallion layer (bronze, silver, and gold), and sink it to the Delta format.
- Part 4 will show the readers how to create Synapse Notebooks in Azure Synapse Analytics and set up the Lake Databases as well as the associated Delta Tables. The reader will also use SQL code to query the Delta Table audit logs as well as implement time traveling, version restoring, z-ordering, and vaccuming commands.
- Part 5 will show the readers how to connect the Delta Tables in the Synapse Analytics Gold Lake Database with Power BI using the Azure Synapse Serverless Endpoint.
2. What is a Data Lakehouse?
A Data Lakehouse is a data management architecture that combines the key advantages of Data Lakes and Data Warehouses into one.
The Data Lakehouse is made up of a Data Lake, that stores the data in a direct-access optimized format, and Serverless Pools, that allow for queries made directly on the Data Lake.
What makes the Data Lakehouse so special is that the Data Lake essentially acts as a Data Warehouse. The Serverless Pools, used to query the data, add an on-demand SQL layer on top of the Data Lake allowing for large-scale data and…