Skip to main content

10 docs tagged with "Data"

General data engineering topics

View all tags

ClickHouse

ClickHouse is an open-source columnar OLAP database built for real-time analytical queries on large datasets. It is the primary platform for high-throughput analytics workloads where query latency and storage efficiency are critical.

ClickHouse Performance Optimizations

ClickHouse is a columnar OLAP database designed for high-throughput analytical queries. Performance problems are almost always caused by incorrect table design, not by hardware limitations.

Data Observability

Data observability is the ability to understand the health of data flowing through your pipeline at any point in time. Without it, data issues are discovered by business users rather than engineers.

dbt Project Structure

A well-organized dbt project enables team collaboration, consistent conventions, and maintainable transformation pipelines. This document defines the standard folder structure and configuration patterns.

dbt Testing Strategy

Data tests are the primary mechanism for ensuring correctness and catching regressions in a dbt project. This document defines a tiered testing strategy across all model layers.

Getting Started with ClickHouse

This document covers ClickHouse installation, initial configuration, and the first steps for setting up an analytical environment.

Lakehouse, Warehouse, DWH Structure

This document defines enterprise-grade, scalable, and cost-efficient standards for a PySpark + S3 Iceberg + Snowflake External Volume + dbt architecture.

Metrics Design Principles

Well-designed metrics are the foundation of reliable analytics. This document defines principles and patterns for creating consistent, trustworthy, and maintainable business metrics.

Semantic Layer Design

The semantic layer sits between the data warehouse and BI tools. It translates raw warehouse tables into business-friendly metrics, dimensions, and relationships.

Snowflake Cost & Performance

Snowflake separates compute and storage, which gives fine-grained cost control — but also introduces new ways to overspend. This document covers performance optimization and cost governance strategies.