dbt Project Structure

A well-organized dbt project enables team collaboration, consistent conventions, and maintainable transformation pipelines. This document defines the standard folder structure and configuration patterns.

Recommended Folder Structure

dbt_project/
├── models/
│   ├── raw/              # Source definitions only (no .sql files)
│   ├── stg/              # Staging: technical cleanup, one model per source table
│   │   ├── source_a/
│   │   └── source_b/
│   ├── base/             # Business-ready entities
│   ├── int/              # Joins, enrichments, reusable intermediate models
│   └── mart/
│       ├── fct/          # Fact tables
│       └── dim/          # Dimension tables
├── macros/               # Reusable Jinja macros
├── tests/                # Custom generic tests
├── snapshots/            # SCD Type 2 snapshots
├── seeds/                # Static reference data (small CSV files)
├── analyses/             # Ad-hoc SQL analyses (not materialized)
├── docs/                 # Custom documentation blocks
├── dbt_project.yml       # Project configuration
└── packages.yml          # External package dependencies

dbt_project.yml Standards

name: my_project
version: '1.0.0'
config-version: 2

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"
clean-targets: ["target", "dbt_packages"]

models:
  my_project:
    stg:
      +materialized: incremental
      +incremental_strategy: merge
    base:
      +materialized: view
    int:
      +materialized: view
    mart:
      fct:
        +materialized: table
      dim:
        +materialized: table

vars:
  raw_incremental_days: 2
  default_timezone: 'UTC'

Source Definitions

All source tables are defined in models/raw/ using _sources.yml files.

version: 2

sources:
  - name: crm
    database: raw_db
    schema: crm
    tables:
      - name: customers
        description: Raw customer records from CRM system
        loaded_at_field: etl_time
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}
      - name: orders
        description: Raw order records

Rules:

One _sources.yml per source system
Always define loaded_at_field for freshness checks
Define freshness thresholds for all production sources

Model File Conventions

Staging Models

-- models/stg/crm/stg_crm__customers.sql
{{
    config(
        materialized='incremental',
        incremental_strategy='merge',
        unique_key='row_hash',
        tags=['layer:stg']
    )
}}

with source as (
    select * from {{ source('crm', 'customers') }}
    {% if is_incremental() %}
    where etl_date >= (
        select dateadd(day, -{{ var('raw_incremental_days') }}, max(etl_date))
        from {{ this }}
    )
    {% endif %}
),

deduped as (
    select *,
        row_number() over (
            partition by customer_id
            order by etl_time desc
        ) as rn
    from source
),

final as (
    select
        customer_id,
        cast(customer_name as varchar) as customer_name,
        cast(email as varchar)         as email,
        cast(country as varchar)       as country,
        etl_date,
        etl_time,
        row_hash
    from deduped
    where rn = 1
)

select * from final

Mart Models

-- models/mart/fct/fct_orders.sql
{{
    config(
        materialized='incremental',
        incremental_strategy='merge',
        unique_key='order_line_sk',
        tags=['layer:mart', 'incremental']
    )
}}

with orders as (
    select * from {{ ref('stg_crm__orders') }}
    {% if is_incremental() %}
    where etl_date >= (
        select dateadd(day, -{{ var('raw_incremental_days') }}, max(etl_date))
        from {{ this }}
    )
    {% endif %}
),

customers as (
    select * from {{ ref('dim_customer') }}
),

final as (
    select
        {{ dbt_utils.generate_surrogate_key(['o.order_id', 'o.line_item_id']) }} as order_line_sk,
        o.order_id,
        o.line_item_id,
        c.customer_sk,
        o.revenue_gross,
        o.order_date,
        o.etl_date
    from orders o
    left join customers c on o.customer_id = c.customer_id
)

select * from final

Schema and Documentation YAML

Every model folder should have a _schema.yml file:

version: 2

models:
  - name: stg_crm__customers
    description: Deduplicated, type-cast staging view of raw CRM customers
    columns:
      - name: customer_id
        description: Natural key from CRM source system
        tests:
          - not_null
          - unique
      - name: email
        description: Customer email address
        tests:
          - not_null
      - name: row_hash
        description: xxhash64 of business key columns
        tests:
          - not_null
          - unique

packages.yml

Standard packages to include in most projects:

packages:
  - package: dbt-labs/dbt_utils
    version: [">=1.0.0", "<2.0.0"]
  - package: calogica/dbt_expectations
    version: [">=0.9.0", "<1.0.0"]
  - package: dbt-labs/audit_helper
    version: [">=0.9.0", "<1.0.0"]

profiles.yml (Development)

my_project:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
      user: "{{ env_var('SNOWFLAKE_USER') }}"
      password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
      role: transformer
      database: dev_db
      warehouse: wh_dev
      schema: "dbt_{{ env_var('DBT_USER', 'dev') }}"
      threads: 4
    prod:
      type: snowflake
      account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
      user: "{{ env_var('SNOWFLAKE_USER') }}"
      password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
      role: transformer_prod
      database: prod_db
      warehouse: wh_etl
      schema: analytics
      threads: 8

Rules:

Never hardcode credentials in profiles
Use personal schemas in development (dbt_yourname)
Use environment variables for all secrets

Golden Rules

One model per source table in stg/
No business logic in stg/ — only technical cleanup
ref() and source() always — never hardcode table names
Every model has a corresponding _schema.yml entry with at least one test
Tags are applied at model level, not only at folder level
Secrets never in version control

Summary

A well-structured dbt project provides:

Predictable layer boundaries: each layer has a single responsibility
Consistent naming: file names communicate layer, source, and entity
Documented models: _schema.yml files accompany every model folder
Testable transformations: tests defined alongside model definitions

Recommended Folder Structure​

dbt_project.yml Standards​

Source Definitions​

Model File Conventions​

Staging Models​

Mart Models​

Schema and Documentation YAML​

packages.yml​

profiles.yml (Development)​

Golden Rules​

Summary​