Work with Microsoft Fabric Lakehouses – A Hands-On Guide

Now that you grasp the fundamental capabilities of Microsoft Fabric Lakehouses, let’s delve into the practical aspects of creating, managing, and utilizing them.

Create and Explore a Lakehouse

You create and configure a new Lakehouse within the Data Engineering workload. Each Lakehouse generates three key components in your Fabric-enabled workspace:

Lakehouse: This is the core storage and metadata area, where you interact with files, folders, and table data.
Semantic model (default): An automatically created semantic model based on the Lakehouse tables, serving as a foundation for Power BI reports.
SQL analytics endpoint: A read-only SQL endpoint for querying data using Transact-SQL.

You can work with the data in two distinct modes:

Lakehouse mode: Add and interact with tables, files, and folders directly within the Lakehouse.
SQL analytics endpoint mode: Use SQL queries to interact with the Lakehouse tables and manage the relational semantic model.

Ingest Data into a Lakehouse

Fabric offers multiple avenues for loading data into your Lakehouse:

Upload: Upload local files or folders directly. Explore and process this data, then load the results into tables.
Dataflows (Gen2): Import and transform data from various sources using Power Query Online, loading it directly into Lakehouse tables.
Notebooks: Utilize Fabric notebooks for data ingestion, transformation, and loading into tables or files.
Data Factory pipelines: Orchestrate data copying and processing activities, loading results into tables or files.

Access Data Using Shortcuts

Shortcuts provide another powerful way to access and use data in Fabric. They enable you to integrate data into your Lakehouse while keeping it stored in its original external location.

This is especially useful when sourcing data from different storage accounts or even different cloud providers. Within your Lakehouse, create shortcuts that point to various storage accounts and other Fabric items like data warehouses, KQL databases, and other Lakehouses.

Source data permissions and credentials are managed centrally by OneLake. When accessing data through a shortcut to another OneLake location, the calling user’s identity is used for authorization. Users must have the necessary permissions in the target location to read the data.

Shortcuts, available in both Lakehouses and KQL databases, appear as folders within the lake, making them accessible to Spark, SQL, Real-Time Intelligence, and Analysis Services for querying.

Explore and Transform Data

Once your data is in the Lakehouse, leverage these tools for exploration and transformation:

Apache Spark: Process data in files and tables using Scala, PySpark, or Spark SQL through Notebooks or Spark Job Definitions.
- Notebooks: Interactive coding interfaces for reading, transforming, and writing data directly to the Lakehouse.
- Spark job definitions: On-demand or scheduled scripts utilizing the Spark engine for data processing.
SQL analytic endpoint: Run Transact-SQL statements to query, filter, aggregate, and explore data in Lakehouse tables.
Dataflows (Gen2): Perform additional transformations using Power Query and optionally load the results back into the Lakehouse.
Data pipelines: Orchestrate complex data transformation logic using a sequence of activities like dataflows, Spark jobs, and control flow elements.

Analyze and Visualize Data

Data in your Lakehouse tables is automatically included in a semantic model that defines a relational structure. You can edit this model or create new ones, defining custom measures, hierarchies, and aggregations. Use the semantic model as the source for Power BI reports to visualize and analyze your data interactively.

By combining Power BI’s visualization capabilities with the Lakehouse’s centralized storage and tabular schema, you can achieve an end-to-end analytics solution within a single platform.

Key Takeaway

Microsoft Fabric Lakehouses offer a versatile and powerful environment for working with your data. From ingestion and transformation to analysis and visualization, Fabric provides the tools you need to derive valuable insights and drive data-driven decision-making.

This blog post is based on information and concepts derived from the Microsoft Learn module titled “Get started with lakehouses in Microsoft Fabric.” The original content can be found here:
https://learn.microsoft.com/en-us/training/modules/get-started-lakehouses/

Work with Microsoft Fabric Lakehouses – A Hands-On Guide

Create and Explore a Lakehouse

Ingest Data into a Lakehouse

Access Data Using Shortcuts

Explore and Transform Data

Analyze and Visualize Data

Comments

Deixe um comentário Cancelar resposta

Deixe um comentário