Traditionally, large-scale data analytics solutions revolved around data warehouses, where data is stored in relational tables and queried using SQL. The rise of “big data,” characterized by high volumes, variety, and velocity, coupled with affordable storage and cloud-scale computing, gave birth to an alternative: the data lake. Data lakes store data as files without imposing a fixed schema for storage.
Today, data professionals are increasingly seeking the best of both worlds, combining the strengths of data lakes and data warehouses into a data lakehouse. A data lakehouse stores data as files in a data lake but applies a relational schema as a metadata layer, enabling queries using traditional SQL.
In Microsoft Fabric, a lakehouse offers highly scalable file storage in a OneLake store (built on Azure Data Lake Store Gen2), coupled with a metastore for relational objects like tables and views, based on the open-source Delta Lake table format. Delta Lake empowers you to define table schemas within your lakehouse, facilitating SQL queries.
Video: Microsoft Fabric Lakehouse from Setup to Insights
Create a Workspace
- Navigate to the Microsoft Fabric home page at https://app.fabric.microsoft.com/home?experience=fabric. Select Synapse Data Engineering.
- In the left menu bar, select Workspaces (🗇 icon).
- Create a new workspace with your desired name. In the Advanced section, choose a licensing mode that includes Fabric capacity (Trial, Premium, or Fabric).
- Your new workspace should open, initially empty.
Create a Lakehouse
- In the Synapse Data Engineering home page, create a new Lakehouse with a name of your choice.
After a brief moment, your new lakehouse will be created.
- View the Lakehouse: Observe the Lakehouse explorer pane on the left, which allows you to browse tables and files.
- The Tables folder contains tables queriable using SQL. These tables are based on the open-source Delta Lake file format, commonly used in Apache Spark.
- The Files folder holds data files in the OneLake storage not associated with managed delta tables. You can also create shortcuts here to reference external data.
Currently, the lakehouse is empty.
Upload a File
Fabric offers various ways to load data, including pipelines and dataflows. For small amounts of data, uploading files is a simple method.
- Download the
sales.csvfile from https://raw.githubusercontent.com/MicrosoftLearning/dp-data/main/sales.csv. - In your lakehouse, create a new subfolder named
datawithin the Files folder. - In the
datafolder’s menu, select Upload -> Upload file, then upload thesales.csvfile. - Verify the upload by selecting the Files/data folder.
- Select the
sales.csvfile to preview its contents.
Explore Shortcuts
Shortcuts allow you to integrate externally stored data without copying it, preventing data inconsistency.
- In the Files folder’s menu, select New shortcut.
- View the available data source types. Close the dialog without creating a shortcut.
Load File Data into a Table
While data professionals can work directly with files using Apache Spark, loading data into a table enables SQL queries.
- On the Home page, select the Files/Data folder to see the
sales.csvfile. - In the file’s menu, select Load to Tables.
- Set the table name to
salesand confirm the load operation. - Wait for the table creation and loading. (Refresh the Tables folder if needed.)
- In the Lakehouse explorer, select the
salestable to view the data. - Select View files in the table’s menu to see the underlying Parquet format files and the
_delta_logsubfolder, which tracks table transactions.
Use SQL to Query Tables
A SQL endpoint is automatically created for your lakehouse, enabling SQL queries on its tables.
- Switch from Lakehouse to SQL analytics endpoint at the top right.
- Wait for the SQL analytics endpoint to open.
- Use the New SQL query button and enter the following query:
SELECT Item, SUM(Quantity * UnitPrice) AS Revenue FROM sales GROUP BY Item ORDER BY Revenue DESC; - Run the query to view the total revenue for each product.
Create a Visual Query & Report (Using Power BI Skills)
- On the toolbar, select New visual query.
- Drag the
salestable to the new editor. - Choose only the
SalesOrderNumberandSalesOrderLineNumbercolumns. - Group the data by
SalesOrderNumber, creating a new columnLineItemsthat counts distinct values ofSalesOrderLineNumber. - Switch to the Model tab to view the semantic model schema.
- In the menu ribbon, select the Reporting tab, then New report.
- In the Data pane, expand the
salestable and selectItemandQuantity. - Hide the Data and Filters panes. Change the visualization to a Clustered bar chart.
- Save the report as
Item Sales Report. - Close the report tab and return to the SQL endpoint. In the hub menu, select your workspace to verify the created items:
- Your lakehouse
- The SQL analytics endpoint
- The default semantic model
- The
Item Sales Report
This comprehensive guide walks you through creating a lakehouse, ingesting data, exploring shortcuts, querying with SQL, and even creating visual queries and reports. You’ve now experienced the power and flexibility of Microsoft Fabric Lakehouses for unified data analytics.
This blog post is based on information and concepts derived from the Microsoft Learn module titled “Get started with lakehouses in Microsoft Fabric.” The original content can be found here:
https://learn.microsoft.com/en-us/training/modules/get-started-lakehouses/

Deixe um comentário Cancelar resposta