ClickHouse Performance Optimizations

ClickHouse is a columnar OLAP database designed for high-throughput analytical queries. Performance problems are almost always caused by incorrect table design, not by hardware limitations.

Core Performance Principles

Primary key and sorting key design is the single most impactful decision
Partition pruning eliminates unnecessary data scans
Projection and materialized views pre-compute expensive aggregations
Data skipping indexes accelerate point lookups on non-key columns

Table Engine Selection

Scenario	Recommended Engine
Primary OLAP workloads	`MergeTree` family
Deduplication required	`ReplacingMergeTree`
Pre-aggregation	`AggregatingMergeTree`
Time-series, partitioned	`MergeTree` with date partitioning
Distributed cluster	`Distributed` over `ReplicatedMergeTree`

Primary Key and Sorting Key Design

How ClickHouse Uses the Primary Key

ClickHouse stores data in granules (default 8192 rows). The primary key index stores the first row of each granule.

A query using the primary key prefix can skip entire granules. A query that does not use the primary key prefix performs a full table scan.

Choosing the Right Primary Key

Rules:

Put the most frequently filtered column first
Put high-cardinality columns second
Keep the primary key short (3–5 columns maximum)

-- Good: filters by date first, then user_id
ORDER BY (event_date, user_id, event_type)

-- Bad: random UUID first destroys locality
ORDER BY (uuid, event_date)

Primary Key vs Sorting Key

In ClickHouse, PRIMARY KEY and ORDER BY (sorting key) are separate concepts:

CREATE TABLE events (
    event_date   Date,
    user_id      UInt64,
    event_type   LowCardinality(String),
    revenue      Decimal(10,2)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, user_id)
PRIMARY KEY (event_date, user_id);

The ORDER BY defines physical sort order. PRIMARY KEY can be a prefix of ORDER BY.

Partitioning

Partitioning separates data into physical directories.

Recommended Partition Keys

Data Volume	Recommended Partition
< 1 TB/year	`toYYYYMM(event_date)`
> 1 TB/year	`toYYYYMMDD(event_date)`
Time-series IoT	`toStartOfHour(event_time)`

Rules:

Always filter by the partition key in queries
Partition count should stay below 1000 per table
Avoid over-partitioning (e.g., partitioning by user_id)

-- Partition pruning: ClickHouse reads only matching partitions
SELECT sum(revenue)
FROM events
WHERE event_date >= '2024-01-01'  -- partition filter applied
  AND user_id = 12345

Data Type Optimization

Choosing correct types significantly reduces storage and query time:

Column Type	Recommendation
Low-cardinality strings	`LowCardinality(String)`
Status flags	`Enum8` or `UInt8`
Timestamps	`DateTime` (not String)
IDs	`UInt64` (not String UUIDs when possible)
Decimals	`Decimal(18,4)` (not Float64 for money)

-- Bad: wastes memory and disables dictionary compression
status String

-- Good: automatic dictionary encoding
status LowCardinality(String)

Materialized Views

Pre-aggregate data at insert time for common query patterns:

CREATE MATERIALIZED VIEW mv_daily_revenue
ENGINE = SummingMergeTree()
PARTITION BY event_date
ORDER BY (event_date, product_id)
AS
SELECT
    event_date,
    product_id,
    sum(revenue) as revenue_total,
    count() as order_count
FROM fct_orders
GROUP BY event_date, product_id;

Query the materialized view instead of the base table for aggregated reports.

Data Skipping Indexes

Add secondary indexes for non-key column filters:

-- Bloom filter for point lookups on email
ALTER TABLE users ADD INDEX idx_email email TYPE bloom_filter GRANULARITY 4;

-- minmax index for range queries
ALTER TABLE events ADD INDEX idx_revenue revenue TYPE minmax GRANULARITY 8;

Best for:

Low-selectivity secondary filters
IN and equality checks on non-key columns

Query Optimization Rules

Use PREWHERE Instead of WHERE

PREWHERE applies a filter before reading other columns, reducing I/O:

-- Standard WHERE reads all columns first
SELECT user_id, revenue FROM events WHERE status = 'completed'

-- PREWHERE is applied during column reading
SELECT user_id, revenue FROM events PREWHERE status = 'completed'

ClickHouse automatically promotes some WHERE conditions to PREWHERE but being explicit helps.

Avoid SELECT *

Always name columns explicitly:

-- Bad: reads all columns
SELECT * FROM fct_orders WHERE event_date = today()

-- Good: reads only needed columns
SELECT order_id, revenue, user_id FROM fct_orders WHERE event_date = today()

ClickHouse is columnar — reading fewer columns directly reduces I/O.

Prefer `toYYYYMM()` over String Dates

-- Bad: no partition pruning
WHERE toString(event_date) LIKE '2024-01%'

-- Good: enables partition pruning
WHERE toYYYYMM(event_date) = 202401

Compression Settings

ClickHouse compresses columns by default. Improve compression with:

-- ZSTD for better compression on text/log data
event_payload String CODEC(ZSTD(3)),

-- Delta + ZSTD for monotonically increasing values (timestamps, IDs)
event_time DateTime CODEC(Delta, ZSTD),

-- DoubleDelta for slowly changing counters
counter UInt64 CODEC(DoubleDelta, ZSTD)

Monitoring Query Performance

Use system.query_log to analyze slow queries:

SELECT
    query_duration_ms,
    read_rows,
    read_bytes,
    result_rows,
    query
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_date = today()
ORDER BY query_duration_ms DESC
LIMIT 20;

Key metrics:

read_rows: rows scanned (lower is better)
read_bytes: bytes read (lower is better)
High read_rows / result_rows ratio indicates missing partition or primary key filter

Summary

ClickHouse performance optimization checklist:

Primary key matches most common query filters
Partition key enables partition pruning
LowCardinality used for string columns with < 10k distinct values
Correct numeric types (not String for IDs or dates)
Materialized views for common aggregations
PREWHERE for selective secondary filters
SELECT * eliminated from production queries
Slow queries monitored via system.query_log

ClickHouse Platform Overview — MergeTree engines, compression, deployment options
Getting Started with ClickHouse — installation, configuration, first table
Case Study: Migrating to ClickHouse — real-world schema design and migration results

Core Performance Principles​

Table Engine Selection​

Primary Key and Sorting Key Design​

How ClickHouse Uses the Primary Key​

Choosing the Right Primary Key​

Primary Key vs Sorting Key​

Partitioning​

Recommended Partition Keys​

Data Type Optimization​

Materialized Views​

Data Skipping Indexes​

Query Optimization Rules​

Use PREWHERE Instead of WHERE​

Avoid SELECT *​

Prefer toYYYYMM() over String Dates​

Compression Settings​

Monitoring Query Performance​

Summary​

Related Docs​