Later Ctrl + ↑

Comparing Tableau and PowerBI training programs

⏱ Время чтения текста – 7 минут

This year I succeeded in becoming a Tableau Desktop Certified Associate. When I was thinking about how to prepare for the exam, I came across e-learning courses from Tableau that turned out to be free for 90 days.

I decided not to waste such an opportunity and complete all the 3 modules in Fundamentals at a fast pace. When I got certified, I was wondering which programs are offered by other producers of BI tools. First things first, I decided to study training materials on PowerBI. In this small article, I would like to compare Tableau and PowerBI training programs.

Disclaimer: in the end, I have formed an unfairly prejudiced and positive attitude towards Tableau, so PowerBI supporters may not like this article and find it biased (in all fairness, there are also words of praise for PowerBI).

After having studied the training materials, I can finally state the reasons why I am definitely in favor of Tableau as a tool for data analysis and visualization.

First of all, there is a huge gap in the approach to materials and the assessment of their understanding. Although Tableau training materials are more technical and pay less attention to design, by studying through their videos you can do excellent visualization. After completing all three steps of Tableau training, a strong desire to create new stunning reports with the use of LOD Expressions, Filter Actions, and make convenient interfaces arises. However, after watching all the materials on Power BI the only question that remains is why did I waste my time?

Emotions aside, there are several key points that turned out to be important after having studied the material.

This is a good dashboard according to Microsoft

The quality of content and training examples

If you consider the way training videos are presented in Tableau and the questions in a quiz format that are posed at the end of the covered material, you start understanding the idea of the software. But in the case of Power BI, you will be totally disappointed. Have a look for instance at the material for identifying outliers, here Microsoft suggests building a scatter plot and visually identifying all the outliers.

Design of reports and dashboards

There is some objective criticism towards Tableau training materials on the topic of graph design and control elements, but they are still neatly and beautifully made. Now have a look at the dreadful thing that Microsoft suggests as the result of the analyst’s work. And this is a well-built dashboard according to Microsoft.

Assessment of the knowledge gained during the training

During the training at Tableau, immediately after a small lecture, you learn by applying the part of the studied material in practice. You need to click certain buttons in the interface to solve a problem. Power BI offers “labs” that are supposed to be launched from a remote machine. I didn’t manage to start a single lab; I wrote to the support 3 times and the support couldn’t solve my problem so I didn’t manage to experiment over the PowerBI tasks.

The results of the analyst’s work according to Microsoft.

Other points are mostly related to the software rather than the training program.

Cross-platform support

I have been working with Tableau for a long time and 4 years ago I switched to Mac. After the transition from Windows, my experience of using Tableau did not change. In fact, Tableau was developing and I was developing with it, but the team did not change the key elements of the interface. I have been experimenting with building reports in PowerBI, but I was uncomfortable with different Microsoft archaisms like publications through some share-portal where you need to have an MS account and configure something through the administrator. All of this was a terrible headache.

However, what struck me so much was that I could not use PowerBI on a Mac. There is absolutely no way and this is a principled stance of Microsoft which is not expected to change in the future. From my point of view, such software belongs to a B2B segment in the field of analytics, assumes the connection to all kinds of DBMS, but denies the existence of an alternative operating system which could be used by a number of potential consultants that could use and promote PowerBI as an analytical tool.

Most certainly, there are some rational reasons why any software from Microsoft doesn’t work very well on Mac, but the simple truth is that for me the software remains inaccessible. Nevertheless, I wasn’t looking for an easy way out and installed PowerBI through Parallels in order to honestly consider the tools again taking into account the training materials.

Visualization options

Both Tableau and PowerBI offer stunning visualization options. In fact, in this regard, PowerBI offers a video with a little more information than usual. So, on this matter, the tools are presented equally well.

Functionality

Here I want to give credits to the functionality of PowerBI. In fact, the variety of tools is extremely wide even without connecting third party libraries. For example, automatic clustering, decomposition tree, data profiler and setting filters on a graph.

Internal language syntax

To work with PowerBI you need to learn DAX. It is not a programming language, but a functional language. You won’t be able to write your own code, however, you won’t even need it – all the functions are already implemented, so you should only learn how to use them. Microsoft tells about DAX quite well in the manual. Definition of a new measure in DAX looks like this:

Revenue YoY % =
DIVIDE(
	[Revenue]
		- CALCULATE(
			[Revenue],
			SAMEPERIODLASTYEAR('Date'[Date])
	),
	CALCULATE(
		[Revenue],
		SAMEPERIODLASTYEAR('Date'[Date])
	)
)

Preparing data for the analysis

Inside PowerBI there is a Unpivot feature that allows bringing the data in columns with time periods into the form that is convenient to use in pivot tables.

However, an ETL tool for data cleaning and wrangling in Tableau Prep also has this feature implemented.

Conclusions:

1) The training programs are built in completely different ways, the methodology of immersion into Tableau tools is more elaborate and efficient. There is an opportunity to get practical experience of solving problems and get feedback (albeit automatic).
2) Reports and dashboards design in training materials from Microsoft hardly look professional while Tableau’s implementation is much better.
3) Knowledge assessment at Microsoft is implemented at the abysmal level (absolutely perfunctory tests like in a bad school) while at Tableau it’s much better implemented, you dive into the problem, think about the answer and solve it.
4) Cross-platform support is not PowerBI’s strongest point, however in the case of Tableau it’s an excellent competitive advantage.
5) The functionality and capabilities of the tools are certainly at the highest level, and in some points, PowerBI wins.

Have a look at our dashboard reviews in Tableau and other BI tools.

No comments 477 2021 BI BI-tools powerbi tableau

Target audience parsing in VK

⏱ Время чтения текста – 5 минут

When posting ads some platforms allow uploading the list of people who will see the ad in audience settings. There are special tools to parse ids from public pages but it’s much more interesting (and cheaper) to do it manually with Python and VK API. Today we will tell how we parsed the target audience for the LEFTJOIN promotional campaign and uploaded it to the advertising account.

Parsing of users

To send requests we will need a user token and the list of VK groups whose participants we want to get. We collected about 30 groups related to analytics, BI tools and Data Science.

import requests 
import time 

group_list = ['datacampus', '185023286', 'data_mining_in_action', '223456', '187222444', 'nta_ds_ai', 'business__intelligence', 'club1981711', 'datascience', 'ozonmasters', 'businessanalysts', 'datamining.team', 'club.shad', '174278716', 'sqlex', 'sql_helper', 'odssib', 'sapbi', 'sql_learn', 'hsespbcareer', 'smartdata', 'pomoshch_s_spss', 'dwhexpert', 'k0d_ds', 'sql_ex_ru', 'datascience_ai', 'data_club', 'mashinnoe_obuchenie_ai_big_data', 'womeninbigdata', 'introstats', 'smartdata', 'data_mining_in_action', 'dlschool_mipt'] 

token = 'your_token'

A request for getting the participants of VK groups will return a maximum of 1000 lines, to get the next 1000 ones we need to increment an offset parameter by 1. But we need to know when to stop incrementing so we will write a function that accepts an id of the group, receives the information about the number of group’s participants and returns the maximum number for the offset – the ratio of the total number of participants to 1000 as we can only get 1000 persons at a time.

def get_offset(group_id): 
    count = requests.get('https://api.vk.com/method/groups.getMembers', 
    params={ 
           'access_token':token, 
           'v':5.103, 
           'group_id': group_id, 
           'sort':'id_desc', 
           'offset':0, 
           'fields':'last_seen' 
    }).json()['response']['count'] 
    return count // 1000

In the next step, we will write a function that accepts the group’s ID, collects all the subscribers into a list and returns it. To do this we will send requests for receiving 1000 people till the offset is over, enter the data into the list and return it. When parsing each person, we will additionally check their last visit date and if they have not logged in since the middle of November, we won’t add them. The time is indicated in unixtime format.

def get_users(group_id): 
    good_id_list = [] 
    offset = 0 
    max_offset = get_offset(group_id) 
    while offset < max_offset: 
        response = requests.get('https://api.vk.com/method/groups.getMembers', 
        params={
        'access_token':token, 
        'v':5.103, 
        'group_id': group_id, 
        'sort':'id_desc', 
        'offset':offset, 
        'fields':'last_seen' }).json()['response'] 
        offset += 1 
        for item in response['items']: 
            try: 
                if item['last_seen']['time'] >= 1605571200:
                    good_id_list.append(item['id']) 
            except Exception as E: 
                continue 
    return good_id_list

Now we will parse all groups from the list, collect the participants, and add them into the all_users list. In the end, we will transfer the list into a set and then back into a list to get rid of the duplicates as the same people might have been members of different groups. After parsing each group, we will pause the program for a second to prevent reaching the requests limit.

all_users = [] 

for group in group_list: 
    print(group) 
    try: 
        users = get_users(group) 
        all_users.extend(users) 
        time.sleep(1) 
    except KeyError as E: 
        print(group, E) 
        continue 

all_users = list(set(all_users))

The last step will be writing each user to a file from a new line.

with open('users.txt', 'w') as f: 
    for item in all_users: 
        f.write("%s\n" % item)

Audience in the advertising account from a file

Let’s open our VK advertising account and choose a “Retargeting” tab. Here we will find the “Create audience” button:

After clicking it, a new window will pop up where we will be able to choose a file as a source and indicate the name of the audience.

The audience will be available some seconds after loading. First 10 minutes it will be indicated that the audience is too small, this is not true, and the panel will refresh soon if your audience really contains more than 100 people.

Results

Let’s compare the average cost of the attracted participant in our group when using the ad with automatic audience targeting and the ad with the audience that we have scraped. In the first case, the average cost is 52.4 rubles, in the second case 33.2 rubles. The selection of a quality audience by parsing data from VK helped us to reduce the average costs by 37%.
We have prepared this post for our advertising campaign:
Hey! You see this ad because we have parsed your id and made a file targeting in VK advertising account. Do you want to know how to do this?
LEFTJOIN – a blog about analytics, visualizations, Data Science and BI. A blog contains a lot of material on different BI and SQL tools, data visualizations and dashboards, work with different APIs (from Google Docs to social networks to the amateurs of beer) and interesting Python libraries.

No comments 987 2021 api python VK VK api

Dbt Coalesce conference: best talks to watch

⏱ Время чтения текста – 3 минуты

The Coalesce 2020 conference, which I’ve mentioned before, took place from 7 till 11 of December 2020. This year, the organizers decided to carry out the conference in 5 days with a bunch of talks.

On the one hand, it’s an advantage as due to the abundance of information you have a sense of choosing what’s more interesting to watch. On the other side, such an amount of information is tiring as often it’s impossible to tell if the presentation will be interesting and useful just based on its name. In my opinion, it’s too much to have more than 3 days for a conference as the audience loses interest. Moreover, the need to deal with personal and professional issues cannot disappear because of the event that although online takes your time.

However, I managed to watch most of the talk, sometimes skimming through. First of all, my overall impression, it is great to study the presentations from conferences like Coalesce as they mostly cover modern BI tools and cloud solutions. Almost every talk mentions Redshift / BigQuery / Snowflake or BI tools like Mode / Tableau / Looker / Metabase. Obviously, dbt is in the middle of everything.

The shortlist of talks that I recommend for studying:

dbt 101 — an introductory talk on what dbt is and how to use it.
Kimball in the context of the modern data warehouse: what’s worth keeping, and what’s not
— an interesting but extremely controversial video that raised a lot of questions in dbt. In short, the author suggests using wide analytical tables and giving up normal forms everywhere.
Building a robust data pipeline with dbt, Airflow, and Great Expectations — a talk about a rather interesting tool called greatexpectations which is used for data validation.
Orchestrating dbt with Dagster — a video seemed a bit boring for me, but if you want to learn about Dagster, you’ll like it.
Supercharging your data team — the guys created a wrapper for dbt called dbt executor 9000 and presented it.
Presenting: SQLFluff — a video about a really cool feature called SQLFluff that automatically edits SQL code according to the SQL rules.
QQuickstart your analytics with Fivetran dbt packages — from this video, you’ll learn about Fivetran and find out how to use it with dbt.
Perfect complements: Using dbt with Looker for effective data governance —
about the interaction of dbt with Looker, differences and similarities of the tools.

No comments 640 2020 analytics coalesce conference dbt education

Tableau Dashboard Overview

⏱ Время чтения текста – 7 минут

In the previous article, we focused on the problem statement, designed a layout, shared our goal to build a Tableau Dashboard for Superstore dataset. The dashboard should provide insights on most profitable regions, products, customer segments and estimate key performance indicators (KPIs) over the past time.

The data in SuperStore Sales reflect sales and profit of the retail chain in Canada. It includes information about customer orders, refunds, sales and geodata. But we’re mostly interested in sales data, as our main goal is to create an executive dashboard to understand company’s operating margin, find most and least lucrative product categories, and customer segments.

So here’s how the dashboard looks like:

All dashboard elements are placed into containers, we can easily resize or change their hierarchy, this enables to optimize the dashboard and make it more mobile/tablet friendly. We can also filter the data by time periods and choose a specific month and year in the top right corner, and all the charts will be redrawn automatically.

The next field shows key factoids on the company performance: profit, sales, orders count, average discount, customers and sales per customer. Each of the indicators displays YOY, a statistical measure to evaluate a company’s financial progress over time. If the indicator shows positive change, a green arrow will be added, if negative – red.

Below are two core charts, displaying regions (colored based on profit) and profit dynamics. We can click on a specific one to view its stats in-depth.

The green dot on the right chart represents data for a selected month this year, while the blue dot displays the same month last year. When hovering these points you can see a trend line, that facilitates assessing how the company’s doing today.

Let’s move to the second part, here we placed company’s products and customers onto 3 charts. The first one, starting from the left, called bar in bar chart, where you can easily explore product efficiency. For instance, Tables is one of the most inefficient categories, with Breford CR4500 that resulted in significant losses.

Bar in bar chart implementation

Then goes the chart with company’s customers, by default they are sorted in descending order by profitability. The chart is linked with Top Performing Provinces, so if we want to discover best or worst customers for the selected province, the data will be redrawn automatically.

Dashboard Evaluation

We evaluated this dashboard according to the criteria below. On a scale of 1 – 10, 10 being the highest, it gets the following scores from our team :

Meets the tasks – 10,0
Learning curve – 5,5
Tool functionality – 9,0
Ease of use – 8,5
Compliance of the result – 10,0
Visual evaluation – 9,7

This Tableau Dashboard scored 8.8 out of 10 from the team! In our perspective, the dashboard fully meets the requirements and facilitates understanding of business performance over a reporting period. We can assess profit dynamics in general or for the selected region, and effectively leverage products and customers data in measuring monetary results. The final version is available through this link.

Please let us know your thoughts in the comments down below, how would you rate this dashboard?

No comments 1115 2020 BI BI-tools guide tableau

How to build Animated Charts like Hans Rosling in Plotly

⏱ Время чтения текста – 12 минут

Hans Rosling’s work on world countries economic growth presented in 2007 at TEDTalks can be attributed to one of the most iconic data visualizations, ever created. Just check out this video, in case you don’t know what we’re talking about:

Sometimes we want to compare standards of living in other countries. One way to do this is to refer to the Big Mac index, which the Economist magazine has kept track of since 1986. The key idea this index represents is to measure purchasing power parity (PPP) in different countries, considering costs of domestic production. To make a standard burger, one would need the following ingredients: cheese, meat, bread and vegetables. Considering that all these ingredients can be produced locally, we can compare the production cost of one Big Mac in different countries, and measure purchasing power. Plus, McDonald’s is the world’s most popular franchise network, its restaurants are almost everywhere around the globe.

In today’s material, we will build a Motion Chart for the Big Mac index using Plotly. Following Hann Rosling’s idea, the chart will display country population along the X-axis and GDP per capita in US dollars along the Y. The size of the dots is going to be proportional to the Big Mac Index for a given country. And the color of the dots will represent the continent where the country is located.

Preparing Data

Even though The Economist has been updating it for over 30 years and sharing its observations publicly, the dataset contains many missing values. It also lacks continents names, but we can handle it by supplementing the data with some more datasets that can be found in our repo.

Let’s start by importing the libraries:

import pandas as pd
from pandas.errors import ParserError
import plotly.graph_objects as go
import numpy as np
import requests
import io

We can access the dataset directly from GitHub. Just use the following function to send a GET request to a CSV file and create a Pandas DataFrame. However, in some cases, this may raise a ParseError because of the caption title, so we will add a try block:

def read_raw_file(link):
    raw_csv = requests.get(link).content
    try:
        df = pd.read_csv(io.StringIO(raw_csv.decode('utf-8')))
    except ParserError:
        df = pd.read_csv(io.StringIO(raw_csv.decode('utf-8')), skiprows=3)
    return df

bigmac_df = read_raw_file('https://github.com/valiotti/leftjoin/raw/master/motion-chart-big-mac/big-mac.csv')
population_df = read_raw_file('https://github.com/valiotti/leftjoin/raw/master/motion-chart-big-mac/population.csv')
dgp_df = read_raw_file('https://github.com/valiotti/leftjoin/raw/master/motion-chart-big-mac/gdp.csv')
continents_df = read_raw_file('https://github.com/valiotti/leftjoin/raw/master/motion-chart-big-mac/continents.csv')

From The Economist dataset we will need these columns: country name, local price, dollar exchange rate, country code (iso_a3) and record date. Take the timeline from 2005 to 2020, as the records are most complete for this span. And divide the local price by the exchange rate to calculate the price of one Big Mac in US dollars.

bigmac_df = bigmac_df[['name', 'local_price', 'dollar_ex', 'iso_a3', 'date']]
bigmac_df = bigmac_df[bigmac_df['date'] >= '2005-01-01']
bigmac_df = bigmac_df[bigmac_df['date'] < '2020-01-01']
bigmac_df['date'] = pd.DatetimeIndex(bigmac_df['date']).year
bigmac_df = bigmac_df.drop_duplicates(['date', 'name'])
bigmac_df = bigmac_df.reset_index(drop=True)
bigmac_df['dollar_price'] = bigmac_df['local_price'] / bigmac_df['dollar_ex']

Take a look at the result:

Next, let’s try adding a new column called continents. To ease the task, leave only two columns containing country code and continent name. Then we need to iterate through the bigmac_df[‘iso_a3’] column, adding a continent name for the corresponding values. However some cases may raise an error, because it’s not really clear, whether a country belongs to Europe or Asia, we will consider such cases as Europe by default.

continents_df = continents_df[['Continent_Name', 'Three_Letter_Country_Code']]
continents_list = []
for country in bigmac_df['iso_a3']:
    try:
        continents_list.append(continents_df.loc[continents_df['Three_Letter_Country_Code'] == country]['Continent_Name'].item())
    except ValueError:
        continents_list.append('Europe')
bigmac_df['continent'] = continents_list

Now we can drop unnecessary columns, apply sorting by country names and date, convert values in the date column into integers, and view the current result:

bigmac_df = bigmac_df.drop(['local_price', 'iso_a3', 'dollar_ex'], axis=1)
bigmac_df = bigmac_df.sort_values(by=['name', 'date'])
bigmac_df['date'] = bigmac_df['date'].astype(int)

Then we need to fill up missing values for The Big Mac index with zeros and remove the Republic of China, since this partially recognized state is not included in the World Bank datasets. The UAE occurs several times, this can lead to issues.

countries_list = list(bigmac_df['name'].unique())
years_set = {i for i in range(2005, 2020)}
for country in countries_list:
    if len(bigmac_df[bigmac_df['name'] == country]) < 15:
        this_continent = bigmac_df[bigmac_df['name'] == country].continent.iloc[0]
        years_of_country = set(bigmac_df[bigmac_df['name'] == country]['date'])
        diff = years_set - years_of_country
        dict_to_df = pd.DataFrame({
                      'name':[country] * len(diff),
                      'date':list(diff),
                      'dollar_price':[0] * len(diff),
                      'continent': [this_continent] * len(diff)
                     })
        bigmac_df = bigmac_df.append(dict_to_df)
bigmac_df = bigmac_df[bigmac_df['name'] != 'Taiwan']
bigmac_df = bigmac_df[bigmac_df['name'] != 'United Arab Emirates']

Next, let’s augment the data with GDP per capita and population from other datasets. Both datasets have differences in country names, so we need to specify such cases explicitly and replace them.

years = [str(i) for i in range(2005, 2020)]

countries_replace_dict = {
    'Russian Federation': 'Russia',
    'Egypt, Arab Rep.': 'Egypt',
    'Hong Kong SAR, China': 'Hong Kong',
    'United Kingdom': 'Britain',
    'Korea, Rep.': 'South Korea',
    'United Arab Emirates': 'UAE',
    'Venezuela, RB': 'Venezuela'
}
for key, value in countries_replace_dict.items():
    population_df['Country Name'] = population_df['Country Name'].replace(key, value)
    gdp_df['Country Name'] = gdp_df['Country Name'].replace(key, value)

Finally, extract population data and GDP for the given years, adding the data to the bigmac_df DataFrame:

countries_list = list(bigmac_df['name'].unique())

population_list = []
gdp_list = []
for country in countries_list:
    population_for_country_df = population_df[population_df['Country Name'] == country][years]
    population_list.extend(list(population_for_country_df.values[0]))
    gdp_for_country_df = gdp_df[gdp_df['Country Name'] == country][years]
    gdp_list.extend(list(gdp_for_country_df.values[0]))
    
bigmac_df['population'] = population_list
bigmac_df['gdp'] = gdp_list
bigmac_df['gdp_per_capita'] = bigmac_df['gdp'] / bigmac_df['population']

And here is our final dataset:

Creating a chart in Plotly

The population in China or India, on average, is 10 times more than in other countries. That’s why we need to transform X-axis to Log Scale, to make the chart easier for interpreting. The log-transformation is a common way to address skewness in data.

fig_dict = {
    "data": [],
    "layout": {},
    "frames": []
}

fig_dict["layout"]["xaxis"] = {"title": "Population", "type": "log"}
fig_dict["layout"]["yaxis"] = {"title": "GDP per capita (in $)", "range":[-10000, 120000]}
fig_dict["layout"]["hovermode"] = "closest"
fig_dict["layout"]["updatemenus"] = [
    {
        "buttons": [
            {
                "args": [None, {"frame": {"duration": 500, "redraw": False},
                                "fromcurrent": True, "transition": {"duration": 300,
                                                                    "easing": "quadratic-in-out"}}],
                "label": "Play",
                "method": "animate"
            },
            {
                "args": [[None], {"frame": {"duration": 0, "redraw": False},
                                  "mode": "immediate",
                                  "transition": {"duration": 0}}],
                "label": "Pause",
                "method": "animate"
            }
        ],
        "direction": "left",
        "pad": {"r": 10, "t": 87},
        "showactive": False,
        "type": "buttons",
        "x": 0.1,
        "xanchor": "right",
        "y": 0,
        "yanchor": "top"
    }
]

We will also add a slider to filter data within a certain range:

sliders_dict = {
    "active": 0,
    "yanchor": "top",
    "xanchor": "left",
    "currentvalue": {
        "font": {"size": 20},
        "prefix": "Year: ",
        "visible": True,
        "xanchor": "right"
    },
    "transition": {"duration": 300, "easing": "cubic-in-out"},
    "pad": {"b": 10, "t": 50},
    "len": 0.9,
    "x": 0.1,
    "y": 0,
    "steps": []
}

By default, the chart will display data for 2005 before we click on the “Play” button.

continents_list_from_df = list(bigmac_df['continent'].unique())
year = 2005
for continent in continents_list_from_df:
    dataset_by_year = bigmac_df[bigmac_df["date"] == year]
    dataset_by_year_and_cont = dataset_by_year[dataset_by_year["continent"] == continent]
    
    data_dict = {
        "x": dataset_by_year_and_cont["population"],
        "y": dataset_by_year_and_cont["gdp_per_capita"],
        "mode": "markers",
        "text": dataset_by_year_and_cont["name"],
        "marker": {
            "sizemode": "area",
            "sizeref": 200000,
            "size":  np.array(dataset_by_year_and_cont["dollar_price"]) * 20000000
        },
        "name": continent,
        "customdata": np.array(dataset_by_year_and_cont["dollar_price"]).round(1),
        "hovertemplate": '<b>%{text}</b>' + '<br>' +
                         'GDP per capita: %{y}' + '<br>' +
                         'Population: %{x}' + '<br>' +
                         'Big Mac price: %{customdata}$' +
                         '<extra></extra>'
    }
    fig_dict["data"].append(data_dict)

Next, we need to fill up the frames field, which will be used for animating the data. Each frame represents a certain data point from 2005 to 2019.

for year in years:
    frame = {"data": [], "name": str(year)}
    for continent in continents_list_from_df:
        dataset_by_year = bigmac_df[bigmac_df["date"] == int(year)]
        dataset_by_year_and_cont = dataset_by_year[dataset_by_year["continent"] == continent]

        data_dict = {
            "x": list(dataset_by_year_and_cont["population"]),
            "y": list(dataset_by_year_and_cont["gdp_per_capita"]),
            "mode": "markers",
            "text": list(dataset_by_year_and_cont["name"]),
            "marker": {
                "sizemode": "area",
                "sizeref": 200000,
                "size": np.array(dataset_by_year_and_cont["dollar_price"]) * 20000000
            },
            "name": continent,
            "customdata": np.array(dataset_by_year_and_cont["dollar_price"]).round(1),
            "hovertemplate": '<b>%{text}</b>' + '<br>' +
                             'GDP per capita: %{y}' + '<br>' +
                             'Population: %{x}' + '<br>' +
                             'Big Mac price: %{customdata}$' +
                             '<extra></extra>'
        }
        frame["data"].append(data_dict)

    fig_dict["frames"].append(frame)
    slider_step = {"args": [
        [year],
        {"frame": {"duration": 300, "redraw": False},
         "mode": "immediate",
         "transition": {"duration": 300}}
    ],
        "label": year,
        "method": "animate"}
    sliders_dict["steps"].append(slider_step)

Just a few finishing touches left, instantiate the chart, set colors, fonts and title.

fig_dict["layout"]["sliders"] = [sliders_dict]

fig = go.Figure(fig_dict)

fig.update_layout(
    title = 
        {'text':'<b>Motion chart</b><br><span style="color:#666666">The Big Mac index from 2005 to 2019</span>'},
    font={
        'family':'Open Sans, light',
        'color':'black',
        'size':14
    },
    plot_bgcolor='rgba(0,0,0,0)'
)
fig.update_yaxes(nticks=4)
fig.update_xaxes(tickfont=dict(family='Open Sans, light', color='black', size=12), nticks=4, gridcolor='lightgray', gridwidth=0.5)
fig.update_yaxes(tickfont=dict(family='Open Sans, light', color='black', size=12), nticks=4, gridcolor='lightgray', gridwidth=0.5)

fig.show()

Bingo! The Motion Chart is done:

View the code on GitHub

No comments 821 2020 Analytics engineering data analytics plotly

Earlier Ctrl + ↓

Your password

LEFT JOIN: blog on analytics, visualisation & data science