Case Study: Scaling Power BI for Large Datasets

Background

A retail analytics team managed a Power BI model that started at ~50M rows. Over 18 months, the dataset grew to 500M rows as new data sources and historical backfill were added. Import refresh times grew from 45 minutes to over 8 hours, and the model frequently exceeded the 10 GB Premium capacity limit.

Problem Statement

Symptoms

Dataset refresh failing after 8+ hours
Model size exceeding 10 GB (Premium P1 limit)
Report render times degraded from 3 seconds to 25+ seconds
Scheduled refresh conflicts due to long refresh windows

Root Cause Analysis

Using the VertiPaq Analyzer:

Three string columns had cardinality > 5M distinct values
A flat, denormalized table contained 80+ columns — most unused in reports
Import mode was used for a fact table that had over 30 daily refresh updates
Auto date/time was enabled, creating 12 hidden date tables

Architecture Before

Snowflake Warehouse
    ↓ Full Import (all 500M rows)
Power BI Dataset (flat table, 80 columns)
    ↓
Reports (12 pages, 120+ visuals)

Architecture After

Snowflake Warehouse
    ├── Aggregation layer (pre-computed daily summaries)
    │       ↓ Full Import (5M rows)
    └── Fact table (500M rows)
            ↓ Incremental Refresh (rolling 2-year window)
Power BI Dataset (star schema, 22 columns in fact)
    ↓
Reports (8 pages, optimized visuals)

Changes Made

1. Implemented Incremental Refresh

Incremental refresh loads only new or modified rows based on a date filter, replacing full imports.

Configuration in Power BI:

Store rows in the last: 2 Years
Refresh rows in the last: 7 Days

Power Query required RangeStart and RangeEnd parameters:

= Table.SelectRows(Source, each [order_date] >= RangeStart and [order_date] < RangeEnd)

Result: refresh time dropped from 8+ hours to 35 minutes.

2. Removed Unused Columns

A column audit was performed by cross-referencing all columns in the model against all DAX measures, report visuals, slicers, and filters.

Before: 80 columns in the fact table After: 22 columns in the fact table

VertiPaq Analyzer showed that 58 removed columns accounted for 3.1 GB of model storage.

3. Replaced Flat Table with Star Schema

The flat denormalized import table was replaced with a proper star schema sourced from a Snowflake mart layer:

dim_customer (200K rows, Import)
dim_product  (80K rows, Import)
dim_date     (3650 rows, Import)
fct_orders   (500M rows, Incremental Refresh)

Relationships use integer surrogate keys.

4. Created Aggregation Tables

A pre-aggregated daily summary table was added in Snowflake and imported into Power BI as an aggregation:

-- Snowflake aggregation view
CREATE VIEW agg_daily_orders AS
SELECT
    order_date,
    product_id,
    customer_segment,
    sum(revenue)     as revenue_total,
    count(order_id)  as order_count
FROM fct_orders
GROUP BY order_date, product_id, customer_segment;

In Power BI, the aggregation was configured to route high-level queries to agg_daily_orders and only drill-through queries to the full fct_orders table.

Result: 95% of dashboard queries hit the 5M-row aggregation instead of the 500M-row fact table.

5. Disabled Auto Date/Time

12 hidden date tables were discovered via VertiPaq Analyzer, consuming 800 MB.

Steps:

File → Options → Current File → Data Load → Disable auto date/time
Added a proper dim_date table
Reconnected all date relationships

6. Replaced High-Cardinality Calculated Columns

Three calculated columns with millions of distinct string values were removed:

Column	Before	After
`full_address`	Calculated column (5M distinct)	Removed (not used in any filter)
`order_reference`	Calculated column (500M distinct)	Moved to drill-through only (degenerate dim)
`customer_label`	Calculated column	Replaced with DAX measure using `SELECTEDVALUE`

Results

Metric	Before	After
Model size	11.2 GB	3.4 GB
Refresh duration	8+ hours	35 minutes
Dashboard render time	25+ seconds	2–4 seconds
Daily refresh failures	3–4 per week	0
Hidden date tables	12	0

Lessons Learned

Most Impactful Changes (in order)

Incremental refresh — eliminated the longest part of refresh time
Aggregation tables — eliminated most full fact table scans
Removing unused columns — single biggest model size reduction
Star schema — improved query efficiency and relationship clarity

What to Do Differently from the Start

Never import a flat table with more than 20 columns into Power BI
Enable incremental refresh before the dataset reaches 50M rows
Build aggregation tables in the warehouse, not in Power BI DAX
Disable auto date/time on every new model, immediately

Checklist for Large Dataset Models

Incremental refresh configured with RangeStart / RangeEnd
Unused columns removed before publishing
Star schema sourced from warehouse mart layer
Aggregation tables defined for high-level queries
Auto date/time disabled
Integer surrogate keys used for all relationships
Model size validated in VertiPaq Analyzer before each publish
Refresh scheduled outside business hours

Summary

Scaling Power BI to 500M rows is achievable with the right architecture. The key insight was that the problem was never the row count — it was the missing star schema, unnecessary columns, and absence of incremental refresh and aggregation.

Model design quality determines whether Power BI can scale. Hardware and capacity are secondary.

Background​

Problem Statement​

Symptoms​

Root Cause Analysis​

Architecture Before​

Architecture After​

Changes Made​

1. Implemented Incremental Refresh​

2. Removed Unused Columns​

3. Replaced Flat Table with Star Schema​

4. Created Aggregation Tables​

5. Disabled Auto Date/Time​

6. Replaced High-Cardinality Calculated Columns​

Results​

Lessons Learned​

Most Impactful Changes (in order)​

What to Do Differently from the Start​

Checklist for Large Dataset Models​

Summary​