13 posts tagged

redash

Later Ctrl + ↑

Using parameters in Redash

⏱ Время чтения текста – 2 минуты

The most convenient and useful thing in Redash is parameters. Parameters can be both in a report and in dashboards.
Parameter – is an element of the interface, controlled by a user producing a report.

Parameter in a report can be of the following types:

Each type explained separately:

Text – text input field, can be used in constructions of LIKE type, such parameter can’t be applied in public dashboards
Number – number, entered by a user
Dropdown list – list of values, from which a user can select just one or several values (not long ago, an option of multi-entry of parameters in a dropdown list appeared)
Query dropdown list – similarly to the previous one, however the values will be taken from the existing query results
Date / Date and Time / Date and Time (with seconds) – fields of date entry
Date Range / Date and Time Range / Date and Time Range (with seconds) – fields of date ranges entry.
It’s convenient to use for the following construction

between '{{parameter.start}}' and '{{parameter.end}}'

In case of dashboard, a situation might arise, when a parameter is named the same way in several queries, then it will become common for all the dashboard, it’s handy.

One of not obvious, but rather useful solutions: how to make a parameter disabled?
Let’s assume, that parameter of “dropdown list” type is named parameter and we want to set it for a column geo of a table, then the code of the query will look approximately like that:

WHERE
    ('{{parameter}}' = 'Disabled' or geo = '{{parameter}}')

At that, surely, ‘Disabled’ should be added to the values of the dropdown list.

No comments 4116 2020 BI-tools parameter redash

Setting up alerts in Redash

⏱ Время чтения текста – 5 минут

A very handy function on alert building is implemented in Redash. Alerts stand for notifications, arising at adjustment of some specific indicator. At that, the degree of change is set manually by a user on his own in the interface. Alerts can be set to a mail or to a channel / direct messages Slack.

Let’s recall how to collect the data, using Google Analytics, and set up alerts for this data, applying the internal Redash database (query-results). As a foundation for an alert we will focus on the reduction of number of users on the site for the previous day by more than 30%.

Building a query to Google Analytics

{
    "ids": "ga:128886640",
    "start_date": "30daysAgo",
    "end_date": "yesterday",
    "metrics": "ga:users", 
    "dimensions": "ga:date"
}

As a result of performance of this query, we will receive a number of users for the last 30 days.

Turning on the storage of query results

One of the main features of redash is an opportunity of calling upon the results of query performance, that are stored in the internal DBMS SQLite.
In order to turn on the storage of results, we need to go to Data Sources and turn on query-results (beta).

And now, with a simple command presented below:

select * from query_37

we will receive a result, similar to the one of the previous query to GA, however this one has an opportunity of using SQL language for processing of the data set obtained.

Building a query for evaluation of changes within the number of users

In order to set up the alert, first of all we need to write a query, that will eventually provide us with a target indicator for check, in our case it is growth or reduction in the number of users on the website.
Lets write a query, calling upon the internal DBMS of Redash:

SELECT sum(CASE WHEN date(ga_date)=DATE('now', '-1 day') THEN ga_users ELSE NULL END) AS yesterday,
sum(CASE WHEN date(ga_date)=DATE('now', '-2 day') THEN ga_users ELSE NULL END) AS before_yesterday,
(sum(CASE WHEN date(ga_date)=DATE('now', '-1 day') THEN ga_users ELSE NULL END)*1.0/
sum(CASE WHEN date(ga_date)=DATE('now', '-2 day') THEN ga_users ELSE NULL END)*1.0-1)*100 AS difference
FROM query_37

In the above-shown query we are calculating the number of users for two previous days, and also evaluating the change of number of users in percentage.
In the current example, we have received the following table of data, that we will use further at the process of alert setting:

Now, in order for us to be able to receive alerts, we need to set query performance by schedule (regular update of results / scheduled query in redash terminology).

We set an update for 10 o’clock of every morning:

Setting alert

Going to the menu Create – Alert. Inserting the name of query, in my case it is “Alert on users”.
Hereafter, we can change the reflected name of notification or leave the one, proposed by the system.

Choosing a target metric within the example reviewed – difference. Below, in the comparison operator (Op) selecting less then and setting a value of -30.

In the block on the right you need to select where exactly the alert will be sent. You can read about the setting of alert’s path more thoroughly on the official website of Redash.

Now, the alert has appeared on the page with the list of alerts and is by default in the status OK.

As soon as the indicator exceeds the level that we’ve set, the status will change to TRIGGERED, and the alert will be sent to mail / to Slack.

Updates in Redash v8

⏱ Время чтения текста – 5 минут

Two weeks ago the final release of Redash edition 8 took place. The complete list of improvements can be found on the page of beta version of v8 release.

In my overview of the novelties, I will focus my attention only on those, that have the most significant impact on user’s experience of Redash utilization, i.e. that are more illustrative for you and me:

Support for multi-select in dropdown (and query dropdown) parameters.
Support for dynamic values in date and date-range parameters.
Search dropdown parameter values.
Pivot Table: support hiding totals.
New Visualization: Details View.

Support for multi-select in dropdown (and query dropdown) parameters.

In the 8th version of Redash we finally receive the long-awaited support of selection of several values in a parameter such as “dropdown list” or “dropdown list on the basis of the query”:

When we select several different clues, the above-mentioned values transform into a list, divided by a comma. At that, there is an opportunity to wrap the values of such a list into single quotes or double quotes.

For us it means an opportunity to use requirements of IN type with a parameter in the queries:

At application of such parameter values, we will receive the following URL in the line of the browser:

Support for dynamic values in date and date-range parameters.

Extremely handy and useful feature for selection of dates, that is often used in Tableau, however hadn’t been implemented into Redash until recently: utilization of dynamic values for dates:

Pressing the zipper button, you can select a random period, and Redash will insert the required values into the query on its own. Thus, for instance, you can quickly look through the statistics for last week or last month.

Search dropdown parameter values.

Let’s assume that we are using a parameter of a type “dropdown list on the basis of the query”, and in the query selected there are more than 300 resulting lines, based on which we need to find a value of our interest. In the 8th version this problem is solved by auto-complete and search by parameter values. In the example below I am inserting a line “Orga” and obtain all the values, in which this line is met, very convenient.

Pivot Table: support hiding totals.

Quite long-awaited feature, that allows to aggregate outcomes by rows / columns.

New Visualization: Details View

New presentation of data results: view of details by each line. I assume, it can be convenient at use in a dashboard, when you apply filtration by a specific user / partner.
It visualizes the names of columns and results of a specific line in a table form.

No comments 700 2019 BI-tools redash visualisation

How to calculate Retention?

⏱ Время чтения текста – 6 минут

In this post we will discover, how to properly construct a report on Retention with application of Redash and SQL language.
For starters, let’s explain in a nutshell what the metric Retention rate is, why it is important,

Retention rate

Retention rate metric is widespread and is particularly popular within the mobile industry, since it allows to understand how well a product engages the users into daily use. Let’s recall (or discover), how Retention is calculated:

Retention of day X – is N% of users that will return to the product on day X. In other words, if on some specific day (day 0) 100 new users came, and 15 returned on the first day, then Retention of the 1st day will be equal to 15/100=15%.
Most commonly, Retention of days 1, 3, 7 and 30 are singled out as the most descriptive metrics of a product, however it’s useful to address Retention curve as a whole and make conclusions, proceeding from it.

Retention curve

In the end, we are interested in construction of such curve, that shows the retention of users from day 0 to day 30.

Retention rate curve from day 0 do day 30

Rolling Retention (RR)

Besides classic Retention rate, Rolling Retention (hereinafter, RR) is allocated. At calculation of RR, apart from day X, all the subsequent days are also considered. Thus, RR of the 1st day – the amount of users who returned on the 1st and subsequent days.

Let’s compare Retention and Rolling Retention of the 10th day:
Retention₁₀ — the amount of users, who returned on the 10th day / the amount of users, who installed the app 10 days ago * 100%.
Rolling Retention₁₀ — the amount of users, who returned on the 10th day or later / the amount of users, who installed the app 10 days ago * 100%.

Granularity (retention of time periods)

In some industries and respective tasks, it is useful to understand the Retention of a specific day (most often, in the mobile industry), in other cases it is useful to understand the retention of users on various time intervals: for example, weekly or monthly periods (oftentimes, it’s handy in e-commerce, retail).

An example of cohorts by months and monthly Retention respective thereto

How to build a Retention report on SQL language?

We have sorted out above how to calculate Retention in formulas. Now let’s apply it with SQL language.
Let’s assume, that we have two tables: user — storing data about users’ identifiers and meta-information, client_session — information on visits of the mobile app by users.
Only these two tables will be present in the query, so you can easily adapt the query to yourself.
note: within this code, I am using Impala as DBMS.

Collecting the size of cohorts

SELECT from_unixtime(user.installed_at, "yyyy-MM-dd") AS reg_date,
          ndv(user.id) AS users
   FROM USER
   WHERE from_unixtime(user.installed_at)>=date_add(now(), -60)
     AND from_unixtime(user.installed_at)<=date_add(now(), -31)
   GROUP BY 1

Let’s sort out this pretty simple query: for every day we calculate the number of unique users for the period [60 days ago; 31 days ago].
In order not to mess with documentation: command ndv() in Impala is analogue of a command count(distinct).

Calculating the number of returned users on each cohort

SELECT from_unixtime(user.installed_at, "yyyy-MM-dd") AS reg_date,
          datediff(cast(cs.created_at AS TIMESTAMP), cast(installed_at AS TIMESTAMP)) AS date_diff,
          ndv(user.id) AS ret_base
   FROM USER
   LEFT JOIN client_session cs ON cs.user_id=user.id
   WHERE 1=1
     AND datediff(cast(cs.created_at AS TIMESTAMP), cast(installed_at AS TIMESTAMP)) between 0 and 30
     AND from_unixtime(user.installed_at)>=date_add(now(), -60)
     AND from_unixtime(user.installed_at)<=date_add(now(), -31)
   GROUP BY 1, 2

In this query, the key part is contained in the command datediff: now we are calculating for each cohort and for each datediff the number of unique users with the very same command ndv() (practically, the number of users, who returned within the days from 0 to 30).

Great, now we have the size of cohorts and the number of returned users.

Combining all together

SELECT reg.reg_date AS date_registration,
       reg.users AS cohort_size,
       cohort.date_diff AS day_difference,
       cohort.ret_base AS retention_base,
       cohort.ret_base/reg.users AS retention_rate
FROM
  (SELECT from_unixtime(user.installed_at, "yyyy-MM-dd") AS reg_date,
          ndv(user.id) AS users
   FROM USER
   WHERE from_unixtime(user.installed_at)>=date_add(now(), -60)
     AND from_unixtime(user.installed_at)<=date_add(now(), -31)
   GROUP BY 1) reg
LEFT JOIN
  (SELECT from_unixtime(user.installed_at, "yyyy-MM-dd") AS reg_date,
          datediff(cast(cs.created_at AS TIMESTAMP), cast(installed_at AS TIMESTAMP)) AS date_diff,
          ndv(user.id) AS ret_base
   FROM USER
   LEFT JOIN client_session cs ON cs.user_id=user.id
   WHERE 1=1
     AND datediff(cast(cs.created_at AS TIMESTAMP), cast(installed_at AS TIMESTAMP)) between 0 and 30
     AND from_unixtime(user.installed_at)>=date_add(now(), -60)
     AND from_unixtime(user.installed_at)<=date_add(now(), -31)
   GROUP BY 1, 2) cohort ON reg.reg_date=cohort.reg_date
    ORDER BY 1,3

We have received the query, that calculates Retention for each cohort, and, eventually, the result can be displayed as follows:

Retention rate, calculated for each cohort of users

Construction of the sole Retention curve

Let’s modify our query a bit and obtain the data for construction of one Retention curve:

SELECT 
       cohort.date_diff AS day_difference,
       avg(reg.users) AS cohort_size,
       avg(cohort.ret_base) AS retention_base,
       avg(cohort.ret_base)/avg(reg.users)*100 AS retention_rate
FROM
  (SELECT from_unixtime(user.installed_at, "yyyy-MM-dd") AS reg_date,
          ndv(user.id) AS users
   FROM USER
   WHERE from_unixtime(user.installed_at)>=date_add(now(), -60)
     AND from_unixtime(user.installed_at)<=date_add(now(), -31)
   GROUP BY 1) reg
LEFT JOIN
  (SELECT from_unixtime(user.installed_at, "yyyy-MM-dd") AS reg_date,
          datediff(cast(cs.created_at AS TIMESTAMP), cast(installed_at AS TIMESTAMP)) AS date_diff,
          ndv(user.id) AS ret_base
   FROM USER
   LEFT JOIN client_session cs ON cs.user_id=user.id
   WHERE 1=1
     AND datediff(cast(cs.created_at AS TIMESTAMP), cast(installed_at AS TIMESTAMP)) between 0 and 30
     AND from_unixtime(user.installed_at)>=date_add(now(), -60)
     AND from_unixtime(user.installed_at)<=date_add(now(), -31)
   GROUP BY 1,2) cohort ON reg.reg_date=cohort.reg_date
    GROUP BY 1        
    ORDER BY 1

Now, we have average by all the cohorts Retention rate, calculated for each day.

More on the subject

No comments 3662 2019 analysis BI-tools redash sql visualisation

Building a funnel-report in redash

⏱ Время чтения текста – 3 минуты

So, we’ve been planning to review Funnel-visualization of a report in Redash.
First and foremost, let’s build a request to the data source that we’ve created – Google Analytics.

The following text needs to be placed in the request console:

{
    "ids": "ga:128886640",
    "start_date": "30daysAgo",
    "end_date": "yesterday",
    "metrics": "ga:users,ga:goal1Completions,ga:goal2Completions,ga:goal3Completions"
}

In this request we are asking API Google Analytics to provide data for the last 30 days on the account GA: 128886640. We want to see the number of users and the number of completion of goals 1, 2 and 3.

As a result, our table will look as follows:

ga:users	ga:goal1Completions	ga:goal2Completions	ga:goal3Completions
3,926	105	41	32

Great, that’s right what we need in order to build a funnel.
Now I will tell you about one very useful Redash feature: query-results. In order to connect tables with results of queries’ execution, we go to Data Sources and search for query-results (beta). Connecting new data source.
Now we have an opportunity to refer to results of Redash queries. Thus, for instance, we can use the results of a requests to Google Analytics API.

How to do it?
We need to choose a data source query-results on the left:

Drop down menu with selection of data sources (in the console – on the left)

Now we’ll learn to make funnel-visualization. For this purpose, we write the following SQL-query:

select 'Add a good to the shopping cart' as step_name, ga_goal1Completions as goalCompletion from query_8
union all
select 'View the shopping cart' as step_name, ga_goal2Completions from query_8
union all
select 'Order processing' as step_name, ga_goal3Completions from query_8

In this case query_8 – is the very table with results of request to the data source Google Analytics.

Let’s set visualization:

Carefully selecting parameters, in order to achieve the desired result

As a result, we receive the funnel of conversions from one goal to another:

You can display this funnel in the dashboard and add filters / parameters thereto.

No comments 1557 2018 BI-tools google analytics redash visualisation

Earlier Ctrl + ↓

Your password

LEFT JOIN: blog on analytics, visualisation & data science