2 posts tagged


Free math courses for analysts and data engineers

Время чтения текста – 4 минуты

Nowadays, the Internet offers a huge number of paid courses that promise to make you a data analyst. Some of them are really great and you get a valuable set of skills upon completion. However, most of them don’t focus on fundamental math and programming skills that are crucial to make it in the field.

Some people believe that an analyst doesn’t need SQL or Python. Others argue that an analyst can solve problems without deep knowledge in math using only hard skills. In my opinion, that’s a big delusion. Apart from hard skills, a good data analyst should have a strong background in math and computer science. If you think that’s a big deal, read on as I have a solution for you.

In my opinion, it’s difficult to reflect on the probability of the outflow without understanding the probability theory. It’s difficult to discuss the mean and the normal distribution without understanding statistics. It’s impossible to grasp SVD without knowledge in linear algebra or to find a gradient without understanding calculus. Some people may argue that an analyst doesn’t need this. Tools like Python / R / Matlab allow building models without taking care of the math. In the beginning, this might even work. You can use a ready-made algorithm, add a couple of commands, and voila, you have built a regression model. But what do you do next? How do you change specific parameters of the model without understanding the math behind it?

Nowadays, the Internet provides us with an incredible opportunity to get an ivy league level education free of charge. A beginner data analyst should benefit from this opportunity before buying online data analysis courses. Just recently, I have completed university-level math and programming courses and I want to share them with you. Although I took advanced math 15 years ago at the university, it was still worth revising (we tend to forget a great deal in 15 years). An additional benefit of such courses is the development of the highly desired analytical thinking skills.

Here is the list of the free online courses from the eminent US universities that I want to share with you. These courses will definitely help you to start your learning journey in data analytics.

Calculus (M.I.T.)

This is an amazing set of courses both by content and interpretation offered by MIT in three parts:

  1. Differentiation
  2. Integration
  3. Coordinate systems and infinite series

Linear Algebra (Georgia Tech)

A course in four parts from one of the leading world universities in Computer Science: Georgia Tech.

  1. Linear equations
  2. Matrix algebra
  3. Determinants and eigenvalues
  4. Orthogonality, symmetric matrices and SVD

Probability theory and mathematical statistics (Georgia Tech)

A course in four parts from one of the leading world universities in Computer Science: Georgia Tech. (в русскоязычной версии тоже эта строчка полностью совпадает с курсом выше)

  1. A gentle introduction to probability
  2. Random variables
  3. A gentle introduction to statistics
  4. Confidence intervals and hypothesis tests

Calculations in Python (Harvard)

A course in 7 parts from a Harvard professor

  1. R basics
  2. Visualization
  3. Probability theory
  4. Inference and modeling
  5. Productivity tools
  6. Wrangling
  7. Linear regression
  8. Machine learning
  9. Capstone
 No comments    98   2021   data analysis courses   education   math

Dbt Coalesce conference: best talks to watch

Время чтения текста – 3 минуты

The Coalesce 2020 conference, which I’ve mentioned before, took place from 7 till 11 of December 2020. This year, the organizers decided to carry out the conference in 5 days with a bunch of talks.

On the one hand, it’s an advantage as due to the abundance of information you have a sense of choosing what’s more interesting to watch. On the other side, such an amount of information is tiring as often it’s impossible to tell if the presentation will be interesting and useful just based on its name. In my opinion, it’s too much to have more than 3 days for a conference as the audience loses interest. Moreover, the need to deal with personal and professional issues cannot disappear because of the event that although online takes your time.

However, I managed to watch most of the talk, sometimes skimming through. First of all, my overall impression, it is great to study the presentations from conferences like Coalesce as they mostly cover modern BI tools and cloud solutions. Almost every talk mentions Redshift / BigQuery / Snowflake or BI tools like Mode / Tableau / Looker / Metabase. Obviously, dbt is in the middle of everything.

The shortlist of talks that I recommend for studying:

  1. dbt 101 — an introductory talk on what dbt is and how to use it.
  2. Kimball in the context of the modern data warehouse: what’s worth keeping, and what’s not 
    — an interesting but extremely controversial video that raised a lot of questions in dbt. In short, the author suggests using wide analytical tables and giving up normal forms everywhere.
  3. Building a robust data pipeline with dbt, Airflow, and Great Expectations — a talk about a rather interesting tool called greatexpectations which is used for data validation.
  4. Orchestrating dbt with Dagster — a video seemed a bit boring for me, but if you want to learn about Dagster, you’ll like it.
  5. Supercharging your data team — the guys created a wrapper for dbt called dbt executor 9000 and presented it.
  6. Presenting: SQLFluff — a video about a really cool feature called SQLFluff that automatically edits SQL code according to the SQL rules.
  7. QQuickstart your analytics with Fivetran dbt packages — from this video, you’ll learn about Fivetran and find out how to use it with dbt.
  8. Perfect complements: Using dbt with Looker for effective data governance
    about the interaction of dbt with Looker, differences and similarities of the tools.
 No comments    351   2020   analytics   coalesce   conference   dbt   education