The SQL-native way to build modern data pipelines

Let’s face it — PySpark has been the go-to tool for building scalable data pipelines in Microsoft Fabric. And for good reason: it’s powerful, flexible, and battle-tested. But it also comes with a cost: complexity, boilerplate, and a steep learning curve for teams that just want to move and transform data.

Enter Materialized Lake Views in Microsoft Fabric — a game-changing feature that lets you build declarative, SQL-based pipelines that are fast, maintainable, and production-ready. No Spark clusters. No orchestration glue. Just clean, modular logic that runs natively on OneLake.

If you’ve ever wished your data pipelines could be as simple as writing a SELECT statement — this is for you.

What are Materialized Lake Views?

Materialized Lake Views (MLVs) are SQL-defined views that are automatically materialized and managed by Microsoft Fabric. Think of them as the Fabric-native way to define your Medallion architecture — Bronze, Silver, and Gold — without writing a single line of PySpark.

They’re:

Declarative: You describe what you want, not how to compute it.
Optimized: Fabric handles refreshes, dependencies, and performance tuning.
Governed: Integrated with lineage, access control, and monitoring.
Composable: Perfect for building layered, modular pipelines.
And because they’re native to Fabric, they work seamlessly with OneLake, Data Factory, and even GraphQL APIs.

Why Materialized Lake Views beat PySpark (in most cases)

Let’s be clear: PySpark still has its place — especially for complex transformations, machine learning, or massive-scale processing. But for the majority of data workflows, MLVs offer a better developer experience and faster time to value.

Feature	PySpark	Materialized Lake Views
Language	Python	SQL
Complexity	High	Low
Orchestration	Manual	Built-in
Performance tuning	Manual	Automatic
Governance	External tools	Native in Fabric
Developer onboarding	Steep	Easy

If your goal is to build clean, maintainable, and governed pipelines, MLVs are the new default.

How to create a Materialized Lake View in Fabric

Creating an MLV is as simple as writing a SQL query.

[( 
    CONSTRAINT constraint_name1 CHECK (condition expression1)[ON MISMATCH DROP | FAIL],  
    CONSTRAINT constraint_name2 CHECK (condition expression2)[ON MISMATCH DROP | FAIL] 
)] 
[PARTITIONED BY (col1, col2, ... )] 
[COMMENT “description or comment”] 
[TBLPROPERTIES (“key1”=”val1”, “key2”=”val2”, ... )] 
AS select_statement

Here’s a basic example:

CREATE MATERIALIZED VIEW Silver_Sales AS
SELECT
    s.SaleID,
    s.CustomerID,
    c.Region,
    s.Amount,
    s.SaleDate
FROM
    Bronze_Sales s
JOIN
    Bronze_Customers c
ON
    s.CustomerID = c.CustomerID
WHERE
    s.SaleDate >= DATEADD(month, -6, CURRENT_TIMESTAMP());

This view:

Joins two Bronze tables
Filters for recent sales
Adds regional context

Fabric takes care of the rest: materialization, refresh logic, dependency tracking, and performance optimization.

You can define views for each Medallion layer:

Bronze: Raw ingestion from source systems
Silver: Cleaned and enriched data
Gold: Business-ready aggregates and KPIs

Each layer builds on the previous one — and because it’s all SQL, it’s easy to reason about, test, and maintain.

Managing and refreshing your views

Fabric automatically tracks dependencies (DAG) between views.

The DAG structure ensures that all dependencies between views are clearly defined and acyclic, allowing Fabric to determine the correct refresh order automatically. This guarantees data consistency across all layers without requiring manual orchestration.

If your Bronze layer updates, the Silver and Gold layers will refresh accordingly — no manual triggers required.

You can:

Monitor refresh status in the Fabric UI
View lineage to understand data flow
Set refresh on schedule
Control access via Fabric’s security model

This means your pipelines are not just easier to build — they’re easier to operate.

Sample lineage of a materialized lake view in Microsoft Fabric

Data Quality: Built-in, Declarative, and Effortless

One of the most underrated features of Materialized Lake Views is how naturally they support data quality enforcement — right where your transformations live.

Instead of writing separate validation scripts or building complex monitoring pipelines, you can now define data quality constraints directly in SQL. Fabric takes care of the rest: invalid rows are either dropped or cause the refresh to fail, depending on how strict you want to be.

It’s clean, declarative, and fully integrated into the platform.

Example: Drop rows with missing values

CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS silver.customers_cleaned
(
    CONSTRAINT non_null_name CHECK (customerName IS NOT NULL) ON MISMATCH DROP
)
AS
SELECT
    customerID,
    customerName,
    email,
    region
FROM bronze.customers;

In this example:

Rows with missing customerName are automatically excluded.
You don’t need to write extra logic — the constraint is part of the view definition.
Dropped rows are tracked in the lineage view, so you always know what was filtered out and why.

Choose your strategy: FAIL or DROP

Use ON MISMATCH FAIL when data quality is critical and you want to stop the pipeline if something’s wrong.
Use ON MISMATCH DROP when you prefer to continue processing and handle issues downstream.

This gives you full control over how strict your pipeline should be — without leaving SQL.

Bonus: Quality metrics as views
You can even define dedicated views to track quality metrics over time:

CREATE MATERIALIZED VIEW gold.customer_quality_metrics AS
SELECT
    COUNT(*) AS total_customers,
    COUNT(CASE WHEN email IS NULL THEN 1 END) AS missing_emails,
    COUNT(CASE WHEN region IS NULL THEN 1 END) AS missing_regions
FROM silver.customers_cleaned;

These metrics can feed dashboards, trigger alerts, or just give you peace of mind that your data is staying clean.

Best practices for working with MLVs

Here’s what’s worked well for me:

Use the Medallion framework: Keep Bronze, Silver, and Gold views clearly separated.
Keep views modular: Avoid giant SQL scripts — break logic into layers.
Name consistently: Use prefixes like Bronze, Silver, Gold_ to stay organized.
Combine with Data Factory: Use pipelines for ingestion and orchestration, then let MLVs handle transformation.
Avoid premature PySpark: Start with SQL — only reach for Spark when absolutely necessary.

When to use PySpark instead

While Materialized Lake Views are a powerful default for most SQL-friendly pipelines, there are still scenarios where PySpark shines. If your transformations require complex business logic, custom Python libraries, or machine learning workflows, PySpark gives you the flexibility and control you need. It's also the better choice when working with non-tabular data, external APIs, or large-scale batch processing that goes beyond what SQL can express.

Wrapping Up

Materialized Lake Views make building data pipelines in Fabric faster, cleaner, and more reliable. With SQL-First logic, built-in data quality checks, and automatic refreshes, you can focus on delivering trusted data — not managing infrastructure.

If you’re still writing PySpark for every transformation, it might be time to rethink your default.