Clickhouse Training 101 by Altinity

Estimated read time – 5 min

Just recently I have completed a Clickhouse Training by Altinity (101 Series Training). For those who are just getting to know Clickhouse, Altinity offers free basic training: Data Warehouse Basics. I recommend starting with it if you are planning to dive into learning.

Certification by Altinity

I would like to share my experience on the training as well as the training notes.
The training costs $500 and lasts 4 days for 2 hours. It is carried out in the evenings Moscow time (starting from 19:00 GMT +3).

Session # 1.

The first day mostly revises everything covered in Data Warehouse Basics, but it has several new ideas on how to get useful information on queries from system tables.

For example, this query will show which commands are running and their status.

SELECT command, is_done
FROM system.mutations
WHERE table = 'ontime'

Besides, for me it was useful to learn about column compression with the use of codecs:

ALTER TABLE ontime
 MODIFY COLUMN TailNum LowCardinality(String) CODEC(ZSTD(1))

For those who are just starting with Clickhouse, the first day will be super useful as it will help in understanding table engines and syntax for their creation, partitions, inserting data (for example directly from S3).

INSERT INTO sdata
SELECT * FROM s3(
 'https://s3.us-east-1.amazonaws.com/d1-altinity/data/sdata*.csv.gz',
 'aws_access_key_id',
 'aws_secret_access_key',
 'Parquet',
 'DevId Int32, Type String, MDate Date, MDatetime
DateTime, Value Float64')

Session # 2.

I found the second day the most intense and useful as within this session Robert from Altinity talks about aggregate functions and materialized views ( detailed scheme for the creation of materialized views ) in Clickhouse in more detail.

It was super useful for me to learn about index types in Clickhouse.

Session # 3.

During the third day, colleagues share their knowledge on how to work with Kafka and JSON objects stored in the tables.
It was interesting to find out that working with arrays in Clickhouse is very similar to arrays in Python:

WITH [1, 2, 4] AS array
SELECT
 array[1] AS First,
 array[2] AS Second,
 array[3] AS Third,
 array[-1] AS Last,
 length(array) AS Length

When working with arrays, there is a great feature called ARRAY JOIN which “unrolls” arrays to rows.

Clickhouse allows you to efficiently interact with JSON objects stored in a table:

-- Get a JSON string value
SELECT JSONExtractString(row, 'request') AS request
FROM log_row LIMIT 3
-- Get a JSON numeric value
SELECT JSONExtractInt(row, 'status') AS status
FROM log_row LIMIT 3

This piece of code is an example of how to extract the elements of the JSON array “request” and “status” separately.

ALTER TABLE log_row
 ADD COLUMN
status Int16 DEFAULT
 JSONExtractInt(row, 'status')
ALTER TABLE log_row
UPDATE status = status WHERE 1 = 1

Session # 4.

The most difficult topic from my point of view was saved for the last day – building sharding and replication patterns and building queries on distributed Clickhouse servers.

Special respect to Altinity for an excellent collection of labs during the training.

Links:

Share
Send
Pin
 64   1 mon   clickhouse   sql
Popular