ClickHouse
ClickHouse is an open-source columnar OLAP database built for real-time analytical queries on large datasets. It is the primary platform for high-throughput analytics workloads where query latency and storage efficiency are critical.
General data engineering topics
View all tagsClickHouse is an open-source columnar OLAP database built for real-time analytical queries on large datasets. It is the primary platform for high-throughput analytics workloads where query latency and storage efficiency are critical.
ClickHouse is a columnar OLAP database designed for high-throughput analytical queries. Performance problems are almost always caused by incorrect table design, not by hardware limitations.
Data observability is the ability to understand the health of data flowing through your pipeline at any point in time. Without it, data issues are discovered by business users rather than engineers.
A well-organized dbt project enables team collaboration, consistent conventions, and maintainable transformation pipelines. This document defines the standard folder structure and configuration patterns.
Data tests are the primary mechanism for ensuring correctness and catching regressions in a dbt project. This document defines a tiered testing strategy across all model layers.
This document covers ClickHouse installation, initial configuration, and the first steps for setting up an analytical environment.
This document defines enterprise-grade, scalable, and cost-efficient standards for a PySpark + S3 Iceberg + Snowflake External Volume + dbt architecture.
Well-designed metrics are the foundation of reliable analytics. This document defines principles and patterns for creating consistent, trustworthy, and maintainable business metrics.
The semantic layer sits between the data warehouse and BI tools. It translates raw warehouse tables into business-friendly metrics, dimensions, and relationships.
Snowflake separates compute and storage, which gives fine-grained cost control — but also introduces new ways to overspend. This document covers performance optimization and cost governance strategies.