Create and Ingest Data with a Microsoft Fabric Lakehouse

Traditionally, large-scale data analytics solutions revolved around data warehouses, where data is stored in relational tables and queried using SQL. The rise of “big data,” characterized by high volumes, variety, and velocity, coupled with affordable storage and cloud-scale computing, gave birth to an alternative: the data lake. Data lakes store data as files without imposing a fixed schema for storage.

Today, data professionals are increasingly seeking the best of both worlds, combining the strengths of data lakes and data warehouses into a data lakehouse. A data lakehouse stores data as files in a data lake but applies a relational schema as a metadata layer, enabling queries using traditional SQL.

In Microsoft Fabric, a lakehouse offers highly scalable file storage in a OneLake store (built on Azure Data Lake Store Gen2), coupled with a metastore for relational objects like tables and views, based on the open-source Delta Lake table format. Delta Lake empowers you to define table schemas within your lakehouse, facilitating SQL queries.

Video: Microsoft Fabric Lakehouse from Setup to Insights

Create a Workspace

Navigate to the Microsoft Fabric home page at https://app.fabric.microsoft.com/home?experience=fabric. Select Synapse Data Engineering.
In the left menu bar, select Workspaces (🗇 icon).
Create a new workspace with your desired name. In the Advanced section, choose a licensing mode that includes Fabric capacity (Trial, Premium, or Fabric).
Your new workspace should open, initially empty.

Create a Lakehouse

In the Synapse Data Engineering home page, create a new Lakehouse with a name of your choice.

After a brief moment, your new lakehouse will be created.

View the Lakehouse: Observe the Lakehouse explorer pane on the left, which allows you to browse tables and files.
- The Tables folder contains tables queriable using SQL. These tables are based on the open-source Delta Lake file format, commonly used in Apache Spark.
- The Files folder holds data files in the OneLake storage not associated with managed delta tables. You can also create shortcuts here to reference external data.

Currently, the lakehouse is empty.

Upload a File

Fabric offers various ways to load data, including pipelines and dataflows. For small amounts of data, uploading files is a simple method.

Download the sales.csv file from https://raw.githubusercontent.com/MicrosoftLearning/dp-data/main/sales.csv.
In your lakehouse, create a new subfolder named data within the Files folder.
In the data folder’s menu, select Upload -> Upload file, then upload the sales.csv file.
Verify the upload by selecting the Files/data folder.
Select the sales.csv file to preview its contents.

Explore Shortcuts

Shortcuts allow you to integrate externally stored data without copying it, preventing data inconsistency.

In the Files folder’s menu, select New shortcut.
View the available data source types. Close the dialog without creating a shortcut.

Load File Data into a Table

While data professionals can work directly with files using Apache Spark, loading data into a table enables SQL queries.

On the Home page, select the Files/Data folder to see the sales.csv file.
In the file’s menu, select Load to Tables.
Set the table name to sales and confirm the load operation.
Wait for the table creation and loading. (Refresh the Tables folder if needed.)
In the Lakehouse explorer, select the sales table to view the data.
Select View files in the table’s menu to see the underlying Parquet format files and the _delta_log subfolder, which tracks table transactions.

Use SQL to Query Tables

A SQL endpoint is automatically created for your lakehouse, enabling SQL queries on its tables.

Switch from Lakehouse to SQL analytics endpoint at the top right.
Wait for the SQL analytics endpoint to open.
Use the New SQL query button and enter the following query: SELECT Item, SUM(Quantity * UnitPrice) AS Revenue FROM sales GROUP BY Item ORDER BY Revenue DESC;
Run the query to view the total revenue for each product.

Create a Visual Query & Report (Using Power BI Skills)

On the toolbar, select New visual query.
Drag the sales table to the new editor.
Choose only the SalesOrderNumber and SalesOrderLineNumber columns.
Group the data by SalesOrderNumber, creating a new column LineItems that counts distinct values of SalesOrderLineNumber.
Switch to the Model tab to view the semantic model schema.
In the menu ribbon, select the Reporting tab, then New report.
In the Data pane, expand the sales table and select Item and Quantity.
Hide the Data and Filters panes. Change the visualization to a Clustered bar chart.
Save the report as Item Sales Report.
Close the report tab and return to the SQL endpoint. In the hub menu, select your workspace to verify the created items:
- Your lakehouse
- The SQL analytics endpoint
- The default semantic model
- The Item Sales Report

This comprehensive guide walks you through creating a lakehouse, ingesting data, exploring shortcuts, querying with SQL, and even creating visual queries and reports. You’ve now experienced the power and flexibility of Microsoft Fabric Lakehouses for unified data analytics.

This blog post is based on information and concepts derived from the Microsoft Learn module titled “Get started with lakehouses in Microsoft Fabric.” The original content can be found here:
https://learn.microsoft.com/en-us/training/modules/get-started-lakehouses/