Data Observability

Data observability is the ability to understand the health of data flowing through your pipeline at any point in time. Without it, data issues are discovered by business users rather than engineers.

The Five Pillars

Pillar	Question
Freshness	Is the data up to date?
Volume	Does the data have the expected number of rows?
Distribution	Are column values within expected ranges?
Schema	Have column names or types changed unexpectedly?
Lineage	If something is wrong, what does it affect?

Most data incidents are caused by failures in freshness or volume. Start there.

Freshness Monitoring

dbt Source Freshness

dbt has built-in source freshness checking:

sources:
  - name: crm
    tables:
      - name: orders
        loaded_at_field: etl_time
        freshness:
          warn_after: {count: 6, period: hour}
          error_after: {count: 24, period: hour}

Run from CI or a scheduled job:

dbt source freshness

Failures are surfaced as warnings or errors in the run output.

Freshness in the Mart Layer

For mart tables, track freshness with a monitoring query:

SELECT
    table_name,
    max(etl_date)                                           AS last_etl_date,
    current_date - max(etl_date)                           AS days_since_refresh,
    CASE
        WHEN current_date - max(etl_date) > 1 THEN 'STALE'
        ELSE 'OK'
    END                                                     AS status
FROM (
    SELECT 'fct_orders'   AS table_name, max(etl_date) AS etl_date FROM mart.fct_orders
    UNION ALL
    SELECT 'fct_sessions' AS table_name, max(etl_date) AS etl_date FROM mart.fct_sessions
)
GROUP BY table_name, etl_date
ORDER BY days_since_refresh DESC;

Volume Monitoring

Row Count Thresholds

Track daily row counts and alert on anomalies:

-- Store daily row counts
INSERT INTO monitoring.table_row_counts
SELECT
    current_date                               AS check_date,
    'fct_orders'                               AS table_name,
    count(*)                                   AS row_count
FROM mart.fct_orders
WHERE etl_date = current_date;

-- Alert on count outside expected range
SELECT
    check_date,
    table_name,
    row_count,
    avg(row_count) OVER (
        ORDER BY check_date
        ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING
    )                                          AS avg_7d,
    CASE
        WHEN row_count < avg_7d * 0.7 THEN 'LOW — possible drop'
        WHEN row_count > avg_7d * 1.5 THEN 'HIGH — possible duplicate'
        ELSE 'OK'
    END                                        AS volume_status
FROM monitoring.table_row_counts
WHERE check_date = current_date;

dbt Volume Test

Using dbt_expectations:

- name: fct_orders
  tests:
    - dbt_expectations.expect_table_row_count_to_be_between:
        min_value: 500
        max_value: 100000
        config:
          where: "etl_date = current_date"

Distribution Monitoring

Track null rates and value distribution shifts:

-- Null rate per column
SELECT
    count_if(customer_id IS NULL) * 100.0 / count(*) AS customer_id_null_pct,
    count_if(revenue IS NULL)     * 100.0 / count(*) AS revenue_null_pct,
    count_if(order_status IS NULL)* 100.0 / count(*) AS status_null_pct
FROM mart.fct_orders
WHERE etl_date = current_date;

Alert when null rate exceeds baseline by more than 5%.

Value Distribution Checks

-- Distribution shift: compare today's status breakdown to last week's
SELECT
    order_status,
    count(*) * 100.0 / sum(count(*)) OVER ()  AS pct_today
FROM mart.fct_orders
WHERE etl_date = current_date
GROUP BY order_status
ORDER BY pct_today DESC;

A sudden spike in cancelled or disappearance of delivered status indicates an upstream issue.

Schema Monitoring

Schema changes (column renames, type changes, dropped columns) silently break downstream models.

dbt `audit_helper` for Schema Diff

The dbt-labs/audit_helper package provides column-level comparison:

{% set old_relation = ref('fct_orders__v1') %}
{% set new_relation = ref('fct_orders') %}

{{ audit_helper.compare_column_values(
    a_relation=old_relation,
    b_relation=new_relation,
    primary_key='order_line_sk',
    column_name='revenue_gross'
) }}

Source Schema Alerts

In modern data stacks, use the data catalog or ingestion platform to detect source schema changes before they break the pipeline.

Lineage

Lineage answers: if this table is broken, what downstream assets are affected?

dbt Lineage

dbt generates lineage automatically from ref() and source() calls.

View in dbt docs:

dbt docs generate
dbt docs serve

Navigate the lineage DAG to trace impact from broken sources to affected marts.

Impact Analysis Before Changes

Before changing a model, check what depends on it:

# List all models downstream of fct_orders
dbt ls --select fct_orders+

Alerting

Observability without alerting is just logging.

dbt On-Run-End Hooks

Send alerts when runs fail:

# dbt_project.yml
on-run-end:
  - "{{ notify_on_failure(results) }}"

Slack / PagerDuty Integration

Route alerts based on severity:

Alert Type	Channel	Severity
Source freshness error	`#data-oncall`	High
Mart row count anomaly	`#data-quality`	Medium
dbt test failure (error)	`#data-oncall`	High
dbt test failure (warn)	`#data-quality`	Low

Minimum Observability Stack

For teams without dedicated observability tooling:

dbt source freshness — scheduled every 30 minutes
dbt test run — scheduled after every pipeline run
Row count monitoring table — populated by a post-run dbt macro
Slack webhook — receives dbt run failure notifications

This covers freshness, volume, and correctness with zero additional tools.

Commercial Tools

Tool	Strengths
Monte Carlo	Full five-pillar coverage, ML anomaly detection
Soda	SQL-based checks, integrates with dbt
Great Expectations	Highly configurable, open-source
Metaplane	dbt-native, lightweight

For most teams, dbt's built-in tests combined with a monitoring table cover 80% of needs.

Summary

Observability checklist:

Source freshness checks defined for all production sources
dbt tests run after every pipeline completion
Daily row counts stored and compared to rolling average
Null rate monitoring on critical columns
Lineage documented via dbt DAG
Alerts routed to appropriate channels by severity
On-call rotation defined for high-severity data incidents

The Five Pillars​

Freshness Monitoring​

dbt Source Freshness​

Freshness in the Mart Layer​

Volume Monitoring​

Row Count Thresholds​

dbt Volume Test​

Distribution Monitoring​

Value Distribution Checks​

Schema Monitoring​

dbt audit_helper for Schema Diff​

Source Schema Alerts​

Lineage​

dbt Lineage​

Impact Analysis Before Changes​

Alerting​

dbt On-Run-End Hooks​

Slack / PagerDuty Integration​

Minimum Observability Stack​

Commercial Tools​

Summary​