Clickhouse Training 101 by Altinity
⏱ Время чтения текста – 5 минутJust recently I have completed a Clickhouse Training by Altinity (101 Series Training). For those who are just getting to know Clickhouse, Altinity offers free basic training: Data Warehouse Basics. I recommend starting with it if you are planning to dive into learning.

I would like to share my experience on the training as well as the training notes.
The training costs $500 and lasts 4 days for 2 hours. It is carried out in the evenings Moscow time (starting from 19:00 GMT +3).
Session # 1.
The first day mostly revises everything covered in Data Warehouse Basics, but it has several new ideas on how to get useful information on queries from system tables.
For example, this query will show which commands are running and their status.
SELECT command, is_done
FROM system.mutations
WHERE table = 'ontime'
Besides, for me it was useful to learn about column compression with the use of codecs:
ALTER TABLE ontime
MODIFY COLUMN TailNum LowCardinality(String) CODEC(ZSTD(1))

For those who are just starting with Clickhouse, the first day will be super useful as it will help in understanding table engines and syntax for their creation, partitions, inserting data (for example directly from S3).
INSERT INTO sdata
SELECT * FROM s3(
'https://s3.us-east-1.amazonaws.com/d1-altinity/data/sdata*.csv.gz',
'aws_access_key_id',
'aws_secret_access_key',
'Parquet',
'DevId Int32, Type String, MDate Date, MDatetime
DateTime, Value Float64')
Session # 2.
I found the second day the most intense and useful as within this session Robert from Altinity talks about aggregate functions and materialized views ( detailed scheme for the creation of materialized views ) in Clickhouse in more detail.

It was super useful for me to learn about index types in Clickhouse.

Session # 3.
During the third day, colleagues share their knowledge on how to work with Kafka and JSON objects stored in the tables.
It was interesting to find out that working with arrays in Clickhouse is very similar to arrays in Python:
WITH [1, 2, 4] AS array
SELECT
array[1] AS First,
array[2] AS Second,
array[3] AS Third,
array[-1] AS Last,
length(array) AS Length
When working with arrays, there is a great feature called ARRAY JOIN which “unrolls” arrays to rows.

Clickhouse allows you to efficiently interact with JSON objects stored in a table:
-- Get a JSON string value
SELECT JSONExtractString(row, 'request') AS request
FROM log_row LIMIT 3
-- Get a JSON numeric value
SELECT JSONExtractInt(row, 'status') AS status
FROM log_row LIMIT 3
This piece of code is an example of how to extract the elements of the JSON array “request” and “status” separately.
ALTER TABLE log_row
ADD COLUMN
status Int16 DEFAULT
JSONExtractInt(row, 'status')
ALTER TABLE log_row
UPDATE status = status WHERE 1 = 1
Session # 4.
The most difficult topic from my point of view was saved for the last day – building sharding and replication patterns and building queries on distributed Clickhouse servers.


Special respect to Altinity for an excellent collection of labs during the training.
Links:
Great training resource on ClickHouse! I just read this article from Pinax ( https://blog.pinax.network/substreams/quickstart-guide-clickhouse-101/ ) about getting started with ClickHouse for blockchain data and would love to know you’re thoughts on it or if you would change anything given your background given your background in ClickHouse training.