Clickhouse Training 101 by Altinity⏱ Estimated read time – 5 min
Just recently I have completed a Clickhouse Training by Altinity (101 Series Training). For those who are just getting to know Clickhouse, Altinity offers free basic training: Data Warehouse Basics. I recommend starting with it if you are planning to dive into learning.
I would like to share my experience on the training as well as the training notes.
The training costs $500 and lasts 4 days for 2 hours. It is carried out in the evenings Moscow time (starting from 19:00 GMT +3).
Session # 1.
The first day mostly revises everything covered in Data Warehouse Basics, but it has several new ideas on how to get useful information on queries from system tables.
For example, this query will show which commands are running and their status.
SELECT command, is_done FROM system.mutations WHERE table = 'ontime'
Besides, for me it was useful to learn about column compression with the use of codecs:
ALTER TABLE ontime MODIFY COLUMN TailNum LowCardinality(String) CODEC(ZSTD(1))
For those who are just starting with Clickhouse, the first day will be super useful as it will help in understanding table engines and syntax for their creation, partitions, inserting data (for example directly from S3).
INSERT INTO sdata SELECT * FROM s3( 'https://s3.us-east-1.amazonaws.com/d1-altinity/data/sdata*.csv.gz', 'aws_access_key_id', 'aws_secret_access_key', 'Parquet', 'DevId Int32, Type String, MDate Date, MDatetime DateTime, Value Float64')
Session # 2.
I found the second day the most intense and useful as within this session Robert from Altinity talks about aggregate functions and materialized views ( detailed scheme for the creation of materialized views ) in Clickhouse in more detail.
It was super useful for me to learn about index types in Clickhouse.
Session # 3.
During the third day, colleagues share their knowledge on how to work with Kafka and JSON objects stored in the tables.
It was interesting to find out that working with arrays in Clickhouse is very similar to arrays in Python:
WITH [1, 2, 4] AS array SELECT array AS First, array AS Second, array AS Third, array[-1] AS Last, length(array) AS Length
When working with arrays, there is a great feature called ARRAY JOIN which “unrolls” arrays to rows.
Clickhouse allows you to efficiently interact with JSON objects stored in a table:
-- Get a JSON string value SELECT JSONExtractString(row, 'request') AS request FROM log_row LIMIT 3 -- Get a JSON numeric value SELECT JSONExtractInt(row, 'status') AS status FROM log_row LIMIT 3
This piece of code is an example of how to extract the elements of the JSON array “request” and “status” separately.
ALTER TABLE log_row ADD COLUMN status Int16 DEFAULT JSONExtractInt(row, 'status') ALTER TABLE log_row UPDATE status = status WHERE 1 = 1
Session # 4.
The most difficult topic from my point of view was saved for the last day – building sharding and replication patterns and building queries on distributed Clickhouse servers.
Special respect to Altinity for an excellent collection of labs during the training.