Data Engineering Docs
A personal knowledge base on data engineering — covering warehouse architecture, data modeling, analytics design, performance tuning, and tooling.
These docs reflect real-world patterns and decisions, not vendor documentation.
Sections
Architecture
Standards and patterns for building data warehouse and lakehouse systems.
- Warehouse Standards & Layer Definitions — dbt layer norms, naming conventions, materialization rules
- Cloud vs On-Premise — decision framework for platform selection
- Kubernetes for Data Platforms — running data workloads on Kubernetes
Data Modeling
Techniques for structuring analytics-ready data.
- Dimensional Modeling — star schema, grain, surrogate keys
- Fact Table Design — fact types, mandatory columns, incremental patterns
- Slowly Changing Dimensions — SCD types, dbt snapshots
- Power BI Semantic Model — VertiPaq-optimized star schema, DAX patterns
Analytics
Designing metrics and BI systems that business teams can trust.
- Metrics Design Principles — grain, naming, anti-patterns, validation checklist
- Power BI Architecture — dataset modes, workspace topology, RLS
- Semantic Layer — responsibilities, dbt metrics, governance
Performance
Query and platform optimization techniques.
- ClickHouse Optimizations — primary key, partitioning, materialized views
- Power BI Performance Tuning — DAX Studio, model size, DAX patterns
- Snowflake Cost & Performance — warehouse isolation, clustering keys, resource monitors
Platforms
Deep-dives on specific data platforms.
- ClickHouse — architecture overview, MergeTree engines, integrations, deployment
Tooling
Setup and configuration guides for common data tools.
- dbt Project Structure — folder layout, source definitions, model templates
- dbt Testing Strategy — generic tests, singular tests, CI/CD integration
- Getting Started with ClickHouse — installation, configuration, first table
- Power BI Incremental Refresh — RangeStart/RangeEnd, query folding, rolling windows
- Data Observability — freshness, volume, distribution, lineage, alerting
Case Studies
Real-world implementations and lessons learned.
- Migrating to ClickHouse from PostgreSQL — CDC pipeline, schema redesign, 20–30× query speedup
- Scaling Power BI for Large Datasets — incremental refresh, aggregation tables, 500M row model