{
    "version": "https:\/\/jsonfeed.org\/version\/1",
    "title": "LEFT JOIN: blog on analytics, visualisation & data science, posts tagged: plotly",
    "home_page_url": "https:\/\/en.leftjoin.ru\/tags\/plotly\/",
    "feed_url": "https:\/\/en.leftjoin.ru\/tags\/plotly\/json\/",
    "icon": "https:\/\/en.leftjoin.ru\/user\/userpic@2x.jpg",
    "author": {
        "name": "Nikolay Valiotti",
        "url": "https:\/\/en.leftjoin.ru\/",
        "avatar": "https:\/\/en.leftjoin.ru\/user\/userpic@2x.jpg"
    },
    "items": [
        {
            "id": "66",
            "url": "https:\/\/en.leftjoin.ru\/all\/python-and-lyrics-of-zemfiras-new-album-capturing-the-spirit-of\/",
            "title": "Python and lyrics of Zemfira’s new album: capturing the spirit of her songs",
            "content_html": "<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/header.jpg\" width=\"600\" height=\"600\" alt=\"\" \/>\n<\/div>\n<p>Zemfira’s latest studio album, Borderline, was released in February, 8 years after the previous one. For this album, various people cooperated with her, including her relatives – the riff for the song “Таблетки” was written by her nephew from London. The album turned out to be diverse: for instance, the song “Остин” is dedicated to the main character of the Homescapes game by the Russian studio Playrix (by the way, check out the latest <a href=\"https:\/\/youtu.be\/SOx8afEUTnE\">Business Secrets with the Bukhman brothers<\/a>, they also mention it there). Zemfira liked the game a lot, thus, she contacted Playrix to create this song. Also, the song “Крым” was written as a soundtrack to a new film by Zemfira’s colleague Renata Litvinova.<\/p>\n<p class=\"note\">Listen new album in <a href=\"https:\/\/music.apple.com\/ru\/album\/бордерлайн\/1554865105\">Apple Music<\/a> \/ <a href=\"https:\/\/music.yandex.ru\/album\/14052981\">Яндекс.Музыке<\/a> \/ <a href=\"https:\/\/open.spotify.com\/album\/6khBsXmKA1FKjYVCIBy9kt\">Spotify<\/a><\/p>\n<p>Nevertheless, the spirit of the whole album is rather gloomy – the songs often repeat the words ‘боль’, ‘ад’, ‘бесишь’ and other synonyms. We decided to conduct an exploratory analysis of her album, and then, using the Word2Vec model and a cosine measure, look at the semantic closeness of the songs and calculate the general mood of the album.<\/p>\n<p>For those who are bored with reading about data preparation and analysis steps, you can <a href=\"https:\/\/leftjoin.ru\/all\/borderline-text-analysis\/#result\">go directly to the results<\/a>.<\/p>\n<h2>Data preparation<\/h2>\n<p>For starters, we write a data processing script. The purpose of the script is to collect a united csv-table from a set of text files, each of which contains a song. At the same time, we have to get rid of all punctuation marks and unnecessary words as we need to focus only on significant content.<\/p>\n<pre class=\"e2-text-code\"><code>import pandas as pd\r\nimport re\r\nimport string\r\nimport pymorphy2\r\nfrom nltk.corpus import stopwords<\/code><\/pre><p>Then we create a morphological analyzer and expand the list of everything that needs to be discarded:<\/p>\n<pre class=\"e2-text-code\"><code>morph = pymorphy2.MorphAnalyzer()\r\nstopwords_list = stopwords.words('russian')\r\nstopwords_list.extend(['куплет', 'это', 'я', 'мы', 'ты', 'припев', 'аутро', 'предприпев', 'lyrics', '1', '2', '3', 'то'])\r\nstring.punctuation += '—'<\/code><\/pre><p>The names of the songs are given in English, so we have to create a dictionary for translation into Russian and a dictionary, from which we will later make a table:<\/p>\n<pre class=\"e2-text-code\"><code>result_dict = dict()\r\n\r\nsongs_dict = {\r\n    'snow':'снег идёт',\r\n    'crimea':'крым',\r\n    'mother':'мама',\r\n    'ostin':'остин',\r\n    'abuse':'абьюз',\r\n    'wait_for_me':'жди меня',\r\n    'tom':'том',\r\n    'come_on':'камон',\r\n    'coat':'пальто',\r\n    'this_summer':'этим летом',\r\n    'ok':'ок',\r\n    'pills':'таблетки'\r\n}<\/code><\/pre><p>Let’s define several necessary functions. The first one reads the entire song from the file and removes line breaks, the second clears the text from unnecessary characters and words, and the third one converts the words to normal form, using the pymorphy2 morphological analyzer. The pymorphy2 module does not always handle ambiguity well – additional processing is required for the words ‘ад’ and ‘рай’.<\/p>\n<pre class=\"e2-text-code\"><code>def read_song(filename):\r\n    f = open(f'{filename}.txt', 'r').read()\r\n    f = f.replace('\\n', ' ')\r\n    return f\r\n\r\ndef clean_string(text):\r\n    text = re.split(' |:|\\.|\\(|\\)|,|&quot;|;|\/|\\n|\\t|-|\\?|\\[|\\]|!', text)\r\n    text = ' '.join([word for word in text if word not in string.punctuation])\r\n    text = text.lower()\r\n    text = ' '.join([word for word in text.split() if word not in stopwords_list])\r\n    return text\r\n\r\ndef string_to_normal_form(string):\r\n    string_lst = string.split()\r\n    for i in range(len(string_lst)):\r\n        string_lst[i] = morph.parse(string_lst[i])[0].normal_form\r\n        if (string_lst[i] == 'аду'):\r\n            string_lst[i] = 'ад'\r\n        if (string_lst[i] == 'рая'):\r\n            string_lst[i] = 'рай'\r\n    string = ' '.join(string_lst)\r\n    return string<\/code><\/pre><p>After all this preparation, we can get back to the data and process each song and read the file with the corresponding name:<\/p>\n<pre class=\"e2-text-code\"><code>name_list = []\r\ntext_list = []\r\nfor song, name in songs_dict.items():\r\n    text = string_to_normal_form(clean_string(read_song(song)))\r\n    name_list.append(name)\r\n    text_list.append(text)<\/code><\/pre><p>Then we combine everything into a DataFrame and save it as a csv-file.<\/p>\n<pre class=\"e2-text-code\"><code>df = pd.DataFrame()\r\ndf['name'] = name_list\r\ndf['text'] = text_list\r\ndf['time'] = [290, 220, 187, 270, 330, 196, 207, 188, 269, 189, 245, 244]\r\ndf.to_csv('borderline.csv', index=False)<\/code><\/pre><p>Result:<\/p>\n<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/2-table.png\" width=\"477\" height=\"365\" alt=\"\" \/>\n<\/div>\n<h2>Word cloud for the whole album<\/h2>\n<p>To begin with the analysis, we have to construct a word cloud, because it can display the most common words found in these songs. We import the required libraries, read the csv-file and set the configurations:<\/p>\n<pre class=\"e2-text-code\"><code>import nltk\r\nfrom wordcloud import WordCloud\r\nimport pandas as pd\r\nimport matplotlib.pyplot as plt\r\nfrom nltk import word_tokenize, ngrams\r\n\r\n%matplotlib inline\r\nnltk.download('punkt')\r\ndf = pd.read_csv('borderline.csv')<\/code><\/pre><p>Now we create a new figure, set the design parameters and, using the word cloud library, display words with the size directly proportional to the frequency of the word. We additionally indicate the name of the song above the corresponding graph.<\/p>\n<pre class=\"e2-text-code\"><code>fig = plt.figure()\r\nfig.patch.set_facecolor('white')\r\nplt.subplots_adjust(wspace=0.3, hspace=0.2)\r\ni = 1\r\nfor name, text in zip(df.name, df.text):\r\n    tokens = word_tokenize(text)\r\n    text_raw = &quot; &quot;.join(tokens)\r\n    wordcloud = WordCloud(colormap='PuBu', background_color='white', contour_width=10).generate(text_raw)\r\n    plt.subplot(4, 3, i, label=name,frame_on=True)\r\n    plt.tick_params(labelsize=10)\r\n    plt.imshow(wordcloud)\r\n    plt.axis(&quot;off&quot;)\r\n    plt.title(name,fontdict={'fontsize':7,'color':'grey'},y=0.93)\r\n    plt.tick_params(labelsize=10)\r\n    i += 1<\/code><\/pre><div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/3-wordcloud.jpg\" width=\"2560\" height=\"1707\" alt=\"\" \/>\n<\/div>\n<h2>EDA of the lyrics<\/h2>\n<p>Let us move to the next part and analyze the lyrics. To do this, we have to import special libraries to deal with data and visualization:<\/p>\n<pre class=\"e2-text-code\"><code>import plotly.graph_objects as go\r\nimport plotly.figure_factory as ff\r\nfrom scipy import spatial\r\nimport collections\r\nimport pymorphy2\r\nimport gensim\r\n\r\nmorph = pymorphy2.MorphAnalyzer()<\/code><\/pre><p>Firstly, we should count the overall number of words in each song, the number of unique words, and their percentage:<\/p>\n<pre class=\"e2-text-code\"><code>songs = []\r\ntotal = []\r\nuniq = []\r\npercent = []\r\n\r\nfor song, text in zip(df.name, df.text):\r\n    songs.append(song)\r\n    total.append(len(text.split()))\r\n    uniq.append(len(set(text.split())))\r\n    percent.append(round(len(set(text.split())) \/ len(text.split()), 2) * 100)<\/code><\/pre><p>All this information should be written in a DataFrame and additionally we want to count the number of words per minute for each song:<\/p>\n<pre class=\"e2-text-code\"><code>df_words = pd.DataFrame()\r\ndf_words['song'] = songs\r\ndf_words['total words'] = total\r\ndf_words['uniq words'] = uniq\r\ndf_words['percent'] = percent\r\ndf_words['time'] = df['time']\r\ndf_words['words per minute'] = round(total \/ (df['time'] \/\/ 60))\r\ndf_words = df_words[::-1]<\/code><\/pre><div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/4-table.png\" width=\"480\" height=\"369\" alt=\"\" \/>\n<\/div>\n<p>It would be great to visualize the data, so let us build two bar charts: one for the number of words in the song, and the other one for the number of words per minute.<\/p>\n<pre class=\"e2-text-code\"><code>colors_1 = ['rgba(101,181,205,255)'] * 12\r\ncolors_2 = ['rgba(62,142,231,255)'] * 12\r\n\r\nfig = go.Figure(data=[\r\n    go.Bar(name='📝 Total number of words,\r\n           text=df_words['total words'],\r\n           textposition='auto',\r\n           x=df_words.song,\r\n           y=df_words['total words'],\r\n           marker_color=colors_1,\r\n           marker=dict(line=dict(width=0)),),\r\n    go.Bar(name='🌀 Unique words',\r\n           text=df_words['uniq words'].astype(str) + '&lt;br&gt;'+ df_words.percent.astype(int).astype(str) + '%' ,\r\n           textposition='inside',\r\n           x=df_words.song,\r\n           y=df_words['uniq words'],\r\n           textfont_color='white',\r\n           marker_color=colors_2,\r\n           marker=dict(line=dict(width=0)),),\r\n])\r\n\r\nfig.update_layout(barmode='group')\r\n\r\nfig.update_layout(\r\n    title = \r\n        {'text':'&lt;b&gt;The ratio of the number of unique words to the total&lt;\/b&gt;&lt;br&gt;&lt;span style=&quot;color:#666666&quot;&gt;&lt;\/span&gt;'},\r\n    showlegend = True,\r\n    height=650,\r\n    font={\r\n        'family':'Open Sans, light',\r\n        'color':'black',\r\n        'size':14\r\n    },\r\n    plot_bgcolor='rgba(0,0,0,0)',\r\n)\r\nfig.update_layout(legend=dict(\r\n    yanchor=&quot;top&quot;,\r\n    xanchor=&quot;right&quot;,\r\n))\r\n\r\nfig.show()<\/code><\/pre><iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https:\/\/plotly.com\/~Elisejj\/96.embed?showlink=false\" height=\"650\" width=\"100%\"><\/iframe>\n<pre class=\"e2-text-code\"><code>colors_1 = ['rgba(101,181,205,255)'] * 12\r\ncolors_2 = ['rgba(238,85,59,255)'] * 12\r\n\r\nfig = go.Figure(data=[\r\n    go.Bar(name='⏱️ Track length, min.',\r\n           text=round(df_words['time'] \/ 60, 1),\r\n           textposition='auto',\r\n           x=df_words.song,\r\n           y=-df_words['time'] \/\/ 60,\r\n           marker_color=colors_1,\r\n           marker=dict(line=dict(width=0)),\r\n          ),\r\n    go.Bar(name='🔄 Words per minute',\r\n           text=df_words['words per minute'],\r\n           textposition='auto',\r\n           x=df_words.song,\r\n           y=df_words['words per minute'],\r\n           marker_color=colors_2,\r\n           textfont_color='white',\r\n           marker=dict(line=dict(width=0)),\r\n          ),\r\n])\r\n\r\nfig.update_layout(barmode='overlay')\r\n\r\nfig.update_layout(\r\n    title = \r\n        {'text':'&lt;b&gt;Track length and words per minute&lt;\/b&gt;&lt;br&gt;&lt;span style=&quot;color:#666666&quot;&gt;&lt;\/span&gt;'},\r\n    showlegend = True,\r\n    height=650,\r\n    font={\r\n        'family':'Open Sans, light',\r\n        'color':'black',\r\n        'size':14\r\n    },\r\n    plot_bgcolor='rgba(0,0,0,0)'\r\n)\r\n\r\n\r\nfig.show()<\/code><\/pre><iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https:\/\/plotly.com\/~Elisejj\/98.embed?showlink=false\" height=\"650\" width=\"100%\"><\/iframe>\n<h2>Working with Word2Vec model<\/h2>\n<p>Using the gensim module, load the model pointing to a binary file:<\/p>\n<pre class=\"e2-text-code\"><code>model = gensim.models.KeyedVectors.load_word2vec_format('model.bin', binary=True)<\/code><\/pre><p class=\"note\">Для материала мы использовали готовую обученную на Национальном Корпусе Русского Языка модель от сообщества <a href=\"https:\/\/rusvectores.org\/ru\/models\/\">RusVectōrēs<\/a><\/p>\n<p>The Word2Vec model is based on neural networks and allows you to represent words in the form of vectors, taking into account the semantic component. It means that if we take two words – for instance, “mom” and “dad”, then represent them as two vectors and calculate the cosine, the values ​​will be close to 1. Similarly, two words that have nothing in common in their meaning have a cosine measure close to 0.<\/p>\n<p>Now we will define the get_vector function: it will take a list of words, recognize a part of speech for each word, and then receive and summarize vectors, so that we can find vectors even for whole sentences and texts.<\/p>\n<pre class=\"e2-text-code\"><code>def get_vector(word_list):\r\n    vector = 0\r\n    for word in word_list:\r\n        pos = morph.parse(word)[0].tag.POS\r\n        if pos == 'INFN':\r\n            pos = 'VERB'\r\n        if pos in ['ADJF', 'PRCL', 'ADVB', 'NPRO']:\r\n            pos = 'NOUN'\r\n        if word and pos:\r\n            try:\r\n                word_pos = word + '_' + pos\r\n                this_vector = model.word_vec(word_pos)\r\n                vector += this_vector\r\n            except KeyError:\r\n                continue\r\n    return vector<\/code><\/pre><p>For each song, find a vector and select the corresponding column in the DataFrame:<\/p>\n<pre class=\"e2-text-code\"><code>vec_list = []\r\nfor word in df['text']:\r\n    vec_list.append(get_vector(word.split()))\r\ndf['vector'] = vec_list<\/code><\/pre><p>So, now we should compare these vectors with one another, calculating their cosine proximity. Those songs with a cosine metric higher than 0.5 will be saved separately – this way we will get the closest pairs of songs. We will write the information about the comparison of vectors into the two-dimensional list result.<\/p>\n<pre class=\"e2-text-code\"><code>similar = dict()\r\nresult = []\r\nfor song_1, vector_1 in zip(df.name, df.vector):\r\n    sub_list = []\r\n    for song_2, vector_2 in zip(df.name.iloc[::-1], df.vector.iloc[::-1]):\r\n        res = 1 - spatial.distance.cosine(vector_1, vector_2)\r\n        if res &gt; 0.5 and song_1 != song_2 and (song_1 + ' \/ ' + song_2 not in similar.keys() and song_2 + ' \/ ' + song_1 not in similar.keys()):\r\n            similar[song_1 + ' \/ ' + song_2] = round(res, 2)\r\n        sub_list.append(round(res, 2))\r\n    result.append(sub_list)<\/code><\/pre><p>Moreover, we can construct the same bar chart:<\/p>\n<pre class=\"e2-text-code\"><code>df_top_sim = pd.DataFrame()\r\ndf_top_sim['name'] = list(similar.keys())\r\ndf_top_sim['value'] = list(similar.values())\r\ndf_top_sim.sort_values(by='value', ascending=False)<\/code><\/pre><p>И построим такой же bar chart:<\/p>\n<pre class=\"e2-text-code\"><code>colors = ['rgba(101,181,205,255)'] * 5\r\n\r\nfig = go.Figure([go.Bar(x=df_top_sim['name'],\r\n                        y=df_top_sim['value'],\r\n                        marker_color=colors,\r\n                        width=[0.4,0.4,0.4,0.4,0.4],\r\n                        text=df_top_sim['value'],\r\n                        textfont_color='white',\r\n                        textposition='auto')])\r\n\r\nfig.update_layout(\r\n    title = \r\n        {'text':'&lt;b&gt;Топ-5 closest songs&lt;\/b&gt;&lt;br&gt;&lt;span style=&quot;color:#666666&quot;&gt;&lt;\/span&gt;'},\r\n    showlegend = False,\r\n    height=650,\r\n    font={\r\n        'family':'Open Sans, light',\r\n        'color':'black',\r\n        'size':14\r\n    },\r\n    plot_bgcolor='rgba(0,0,0,0)',\r\n    xaxis={'categoryorder':'total descending'}\r\n)\r\n\r\nfig.show()<\/code><\/pre><iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https:\/\/plotly.com\/~Elisejj\/100.embed?showlink=false\" height=\"650\" width=\"100%\"><\/iframe>\n<p>Given the vector of each song, let us calculate the vector of the entire album – add the vectors of the songs. Then, for such a vector, using the model, we get the words that are the closest in spirit and meaning.<\/p>\n<pre class=\"e2-text-code\"><code>def get_word_from_tlist(lst):\r\n    for word in lst:\r\n        word = word[0].split('_')[0]\r\n        print(word, end=' ')\r\n\r\nvec_sum = 0\r\nfor vec in df.vector:\r\n    vec_sum += vec\r\nsim_word = model.similar_by_vector(vec_sum)\r\nget_word_from_tlist(sim_word)<\/code><\/pre><p><span style=\"color: '#65b5cd'; font-size: 1.2em\"><b>небо тоска тьма пламень плакать горе печаль сердце солнце мрак<\/b><\/span><\/p>\n<p>This is probably the key result and the description of Zemfira’s album in just 10 words.<\/p>\n<p>Finally, we build a general heat map, each cell of which is the result of comparing the texts of two tracks with a cosine measure.<\/p>\n<pre class=\"e2-text-code\"><code>colorscale=[[0.0, &quot;rgba(255,255,255,255)&quot;],\r\n            [0.1, &quot;rgba(229,232,237,255)&quot;],\r\n            [0.2, &quot;rgba(216,222,232,255)&quot;],\r\n            [0.3, &quot;rgba(205,214,228,255)&quot;],\r\n            [0.4, &quot;rgba(182,195,218,255)&quot;],\r\n            [0.5, &quot;rgba(159,178,209,255)&quot;],\r\n            [0.6, &quot;rgba(137,161,200,255)&quot;],\r\n            [0.7, &quot;rgba(107,137,188,255)&quot;],\r\n            [0.8, &quot;rgba(96,129,184,255)&quot;],\r\n            [1.0, &quot;rgba(76,114,176,255)&quot;]]\r\n\r\nfont_colors = ['black']\r\nx = list(df.name.iloc[::-1])\r\ny = list(df.name)\r\nfig = ff.create_annotated_heatmap(result, x=x, y=y, colorscale=colorscale, font_colors=font_colors)\r\nfig.show()<\/code><\/pre><iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https:\/\/plotly.com\/~Elisejj\/82.embed?showlink=false\" height=\"650\" width=\"100%\"><\/iframe>\n<h2><a name=\"result\">Results and data interpretation<\/a><\/h2>\n<p>To give valuable conclusions, we would like to take another look at everything we got. First of all, let us focus on the word cloud. It is easy to see that the words ‘боль’, ‘невозможно’, ‘сорваться’, ‘растерзаны’, ‘сложно’, ‘терпеть’, ‘любить’ have a very decent size, because such words are often found throughout the entire lyrics:<\/p>\n<p>Давайте ещё раз посмотрим на всё, что у нас получилось — начнём с облака слов. Нетрудно заметить, что у слов «боль», «невозможно», «сорваться», «растерзаны», «сложно», «терпеть», «любить» размер весьма приличный — всё потому, что такие слова встречаются часто на протяжении всего текста песен:<\/p>\n<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/9-wordcloud.jpg\" width=\"2560\" height=\"1707\" alt=\"\" \/>\n<\/div>\n<p>The song “Крым” turned out to be one of the most diverse songs – it contains 74% of unique words. Also, the song “Снег идет” contains very few words, so the majority, which is 82%, are unique. The largest song on the album in terms of amount of words is the track “Таблетки” – there are about 150 words in total.<\/p>\n<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https:\/\/plotly.com\/~Elisejj\/96.embed?showlink=false\" height=\"650\" width=\"100%\"><\/iframe>\n<p>As it was shown on the last chart, the most dynamic track is “Таблетки”, as much as 37 words per minute – nearly one word for every two seconds – and the longest track is “Абьюз”, and according to the previous chart, it also has the lowest percentage of unique words – 46%.<\/p>\n<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https:\/\/plotly.com\/~Elisejj\/98.embed?showlink=false\" height=\"650\" width=\"100%\"><\/iframe>\n<p>Top 5 most semantically similar text pairs:<\/p>\n<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https:\/\/plotly.com\/~Elisejj\/100.embed?showlink=false\" height=\"650\" width=\"100%\"><\/iframe>\n<p>We also got the vector of the entire album and found the closest words. Just take a look at them – ‘тьма’, ‘тоска’, ‘плакать’, ‘горе’, ‘печаль’, ‘сердце’ – this is the list of words that characterizes Zemfira’s lyrics!<\/p>\n<p><span style=\"color: '#65b5cd'; font-size: 1.2em\"><b>небо тоска тьма пламень плакать горе печаль сердце солнце мрак<\/b><\/span><\/p>\n<p>The final result is a heat map. From the visualization, it is noticeable that almost all songs are quite similar to each other – the cosine measure for many pairs exceeds the value of 0.4.<\/p>\n<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https:\/\/plotly.com\/~Elisejj\/82.embed?showlink=false\" height=\"650\" width=\"100%\"><\/iframe>\n<h2>Conclusions<\/h2>\n<p>In the material, we carried out an EDA of the entire text of the new album and, using the pre-trained Word2Vec model, we proved the hypothesis – most of the “Borderline” songs are permeated with rather dark lyrics. However, this is normal, because we love Zemfira precisely for her sincerity and straightforwardness.<\/p>\n",
            "date_published": "2021-09-07T13:46:21+03:00",
            "date_modified": "2021-09-07T13:46:08+03:00",
            "image": "https:\/\/en.leftjoin.ru\/pictures\/header.jpg",
            "_date_published_rfc2822": "Tue, 07 Sep 2021 13:46:21 +0300",
            "_rss_guid_is_permalink": "false",
            "_rss_guid": "66",
            "_e2_data": {
                "is_favourite": false,
                "links_required": [
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css"
                ],
                "og_images": [
                    "https:\/\/en.leftjoin.ru\/pictures\/header.jpg",
                    "https:\/\/en.leftjoin.ru\/pictures\/2-table.png",
                    "https:\/\/en.leftjoin.ru\/pictures\/3-wordcloud.jpg",
                    "https:\/\/en.leftjoin.ru\/pictures\/4-table.png",
                    "https:\/\/en.leftjoin.ru\/pictures\/9-wordcloud.jpg"
                ]
            }
        },
        {
            "id": "53",
            "url": "https:\/\/en.leftjoin.ru\/all\/how-to-build-animated-charts-like-hans-rosling-in-plotly\/",
            "title": "How to build Animated Charts like Hans Rosling in Plotly",
            "content_html": "<p>Hans Rosling’s work on world countries economic growth presented in 2007 at TEDTalks can be attributed to one of the most iconic data visualizations, ever created. Just check out this video, in case you don’t know what we’re talking about:<\/p>\n<div class=\"e2-text-video\">\n<iframe src=\"https:\/\/www.youtube.com\/embed\/hVimVzgtD6w\" frameborder=\"0\" allowfullscreen><\/iframe><\/div>\n<p>Sometimes we want to compare standards of living in other countries. One way to do this is to refer to the Big Mac index, which the Economist magazine has kept track of since 1986. The key idea this index represents is to measure purchasing power parity (PPP) in different countries, considering costs of domestic production. To make a standard burger, one would need the following ingredients: cheese, meat, bread and vegetables. Considering that all these ingredients can be produced locally, we can compare the production cost of one Big Mac in different countries, and measure purchasing power. Plus, McDonald’s is the world’s most popular franchise network,  its restaurants are almost everywhere around the globe.<\/p>\n<p>In today’s material, we will build a Motion Chart for the Big Mac index using Plotly. Following Hann Rosling’s idea, the chart will display country population along the X-axis and GDP per capita in US dollars along the Y. The size of the dots is going to be proportional to the Big Mac Index for a given country. And the color of the dots will represent the continent where the country is located.<\/p>\n<h2>Preparing Data<\/h2>\n<p>Even though The Economist has been updating it for over 30 years and sharing its observations publicly, the dataset contains many missing values. It also lacks continents names, but we can handle it by supplementing the data with some more datasets that can be found in our repo.<\/p>\n<p>Let’s start by importing the libraries:<\/p>\n<pre class=\"e2-text-code\"><code>import pandas as pd\r\nfrom pandas.errors import ParserError\r\nimport plotly.graph_objects as go\r\nimport numpy as np\r\nimport requests\r\nimport io<\/code><\/pre><p>We can access the dataset directly from GitHub. Just use the following function to send a GET request to a CSV file and create a Pandas DataFrame. However, in some cases, this may raise a  ParseError because of the caption title, so we will add a try block:<\/p>\n<pre class=\"e2-text-code\"><code>def read_raw_file(link):\r\n    raw_csv = requests.get(link).content\r\n    try:\r\n        df = pd.read_csv(io.StringIO(raw_csv.decode('utf-8')))\r\n    except ParserError:\r\n        df = pd.read_csv(io.StringIO(raw_csv.decode('utf-8')), skiprows=3)\r\n    return df\r\n\r\nbigmac_df = read_raw_file('https:\/\/github.com\/valiotti\/leftjoin\/raw\/master\/motion-chart-big-mac\/big-mac.csv')\r\npopulation_df = read_raw_file('https:\/\/github.com\/valiotti\/leftjoin\/raw\/master\/motion-chart-big-mac\/population.csv')\r\ndgp_df = read_raw_file('https:\/\/github.com\/valiotti\/leftjoin\/raw\/master\/motion-chart-big-mac\/gdp.csv')\r\ncontinents_df = read_raw_file('https:\/\/github.com\/valiotti\/leftjoin\/raw\/master\/motion-chart-big-mac\/continents.csv')<\/code><\/pre><p>From The Economist dataset we will need these columns: country name, local price, dollar exchange rate, country code (iso_a3) and record date. Take the timeline from 2005 to 2020, as the records are most complete for this span. And divide the local price by the exchange rate to calculate the price of one Big Mac in US dollars.<\/p>\n<pre class=\"e2-text-code\"><code>bigmac_df = bigmac_df[['name', 'local_price', 'dollar_ex', 'iso_a3', 'date']]\r\nbigmac_df = bigmac_df[bigmac_df['date'] &gt;= '2005-01-01']\r\nbigmac_df = bigmac_df[bigmac_df['date'] &lt; '2020-01-01']\r\nbigmac_df['date'] = pd.DatetimeIndex(bigmac_df['date']).year\r\nbigmac_df = bigmac_df.drop_duplicates(['date', 'name'])\r\nbigmac_df = bigmac_df.reset_index(drop=True)\r\nbigmac_df['dollar_price'] = bigmac_df['local_price'] \/ bigmac_df['dollar_ex']<\/code><\/pre><p>Take a look at the result:<\/p>\n<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/1_1.png\" width=\"465\" height=\"182\" alt=\"\" \/>\n<\/div>\n<p>Next, let’s try adding a new column called continents. To ease the task, leave only two columns containing country code and continent name. Then we need to iterate through the bigmac_df[‘iso_a3’] column, adding a continent name for the corresponding values. However some cases may raise an error, because it’s not really clear, whether a country belongs to Europe or Asia, we will consider such cases as Europe by default.<\/p>\n<pre class=\"e2-text-code\"><code>continents_df = continents_df[['Continent_Name', 'Three_Letter_Country_Code']]\r\ncontinents_list = []\r\nfor country in bigmac_df['iso_a3']:\r\n    try:\r\n        continents_list.append(continents_df.loc[continents_df['Three_Letter_Country_Code'] == country]['Continent_Name'].item())\r\n    except ValueError:\r\n        continents_list.append('Europe')\r\nbigmac_df['continent'] = continents_list<\/code><\/pre><p>Now we can drop unnecessary columns,  apply sorting by country names and date, convert values in the date column into integers, and view the current result:<\/p>\n<pre class=\"e2-text-code\"><code>bigmac_df = bigmac_df.drop(['local_price', 'iso_a3', 'dollar_ex'], axis=1)\r\nbigmac_df = bigmac_df.sort_values(by=['name', 'date'])\r\nbigmac_df['date'] = bigmac_df['date'].astype(int)<\/code><\/pre><div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/2-20.png\" width=\"321\" height=\"176\" alt=\"\" \/>\n<\/div>\n<p>Then we need to fill up missing values for The Big Mac index with zeros and remove the Republic of China, since this partially recognized state is not included in the <a href=\"https:\/\/data.worldbank.org\">World Bank<\/a> datasets. The UAE occurs several times, this can lead to issues.<\/p>\n<pre class=\"e2-text-code\"><code>countries_list = list(bigmac_df['name'].unique())\r\nyears_set = {i for i in range(2005, 2020)}\r\nfor country in countries_list:\r\n    if len(bigmac_df[bigmac_df['name'] == country]) &lt; 15:\r\n        this_continent = bigmac_df[bigmac_df['name'] == country].continent.iloc[0]\r\n        years_of_country = set(bigmac_df[bigmac_df['name'] == country]['date'])\r\n        diff = years_set - years_of_country\r\n        dict_to_df = pd.DataFrame({\r\n                      'name':[country] * len(diff),\r\n                      'date':list(diff),\r\n                      'dollar_price':[0] * len(diff),\r\n                      'continent': [this_continent] * len(diff)\r\n                     })\r\n        bigmac_df = bigmac_df.append(dict_to_df)\r\nbigmac_df = bigmac_df[bigmac_df['name'] != 'Taiwan']\r\nbigmac_df = bigmac_df[bigmac_df['name'] != 'United Arab Emirates']<\/code><\/pre><p>Next,  let’s augment the data with GDP per capita and population from other datasets. Both datasets have differences in country names, so we need to specify such cases explicitly and replace them.<\/p>\n<pre class=\"e2-text-code\"><code>years = [str(i) for i in range(2005, 2020)]\r\n\r\ncountries_replace_dict = {\r\n    'Russian Federation': 'Russia',\r\n    'Egypt, Arab Rep.': 'Egypt',\r\n    'Hong Kong SAR, China': 'Hong Kong',\r\n    'United Kingdom': 'Britain',\r\n    'Korea, Rep.': 'South Korea',\r\n    'United Arab Emirates': 'UAE',\r\n    'Venezuela, RB': 'Venezuela'\r\n}\r\nfor key, value in countries_replace_dict.items():\r\n    population_df['Country Name'] = population_df['Country Name'].replace(key, value)\r\n    gdp_df['Country Name'] = gdp_df['Country Name'].replace(key, value)<\/code><\/pre><p>Finally, extract population data and GDP for the given years, adding the data to the bigmac_df DataFrame:<\/p>\n<pre class=\"e2-text-code\"><code>countries_list = list(bigmac_df['name'].unique())\r\n\r\npopulation_list = []\r\ngdp_list = []\r\nfor country in countries_list:\r\n    population_for_country_df = population_df[population_df['Country Name'] == country][years]\r\n    population_list.extend(list(population_for_country_df.values[0]))\r\n    gdp_for_country_df = gdp_df[gdp_df['Country Name'] == country][years]\r\n    gdp_list.extend(list(gdp_for_country_df.values[0]))\r\n    \r\nbigmac_df['population'] = population_list\r\nbigmac_df['gdp'] = gdp_list\r\nbigmac_df['gdp_per_capita'] = bigmac_df['gdp'] \/ bigmac_df['population']<\/code><\/pre><p>And here is our final dataset:<\/p>\n<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/3-16.png\" width=\"590\" height=\"179\" alt=\"\" \/>\n<\/div>\n<h2>Creating a chart in Plotly<\/h2>\n<p>The population in China or India, on average, is 10 times more than in other countries. That’s why we need to transform X-axis to Log Scale, to make the chart easier for interpreting. The log-transformation is a common way to address skewness in data.<\/p>\n<pre class=\"e2-text-code\"><code>fig_dict = {\r\n    &quot;data&quot;: [],\r\n    &quot;layout&quot;: {},\r\n    &quot;frames&quot;: []\r\n}\r\n\r\nfig_dict[&quot;layout&quot;][&quot;xaxis&quot;] = {&quot;title&quot;: &quot;Population&quot;, &quot;type&quot;: &quot;log&quot;}\r\nfig_dict[&quot;layout&quot;][&quot;yaxis&quot;] = {&quot;title&quot;: &quot;GDP per capita (in $)&quot;, &quot;range&quot;:[-10000, 120000]}\r\nfig_dict[&quot;layout&quot;][&quot;hovermode&quot;] = &quot;closest&quot;\r\nfig_dict[&quot;layout&quot;][&quot;updatemenus&quot;] = [\r\n    {\r\n        &quot;buttons&quot;: [\r\n            {\r\n                &quot;args&quot;: [None, {&quot;frame&quot;: {&quot;duration&quot;: 500, &quot;redraw&quot;: False},\r\n                                &quot;fromcurrent&quot;: True, &quot;transition&quot;: {&quot;duration&quot;: 300,\r\n                                                                    &quot;easing&quot;: &quot;quadratic-in-out&quot;}}],\r\n                &quot;label&quot;: &quot;Play&quot;,\r\n                &quot;method&quot;: &quot;animate&quot;\r\n            },\r\n            {\r\n                &quot;args&quot;: [[None], {&quot;frame&quot;: {&quot;duration&quot;: 0, &quot;redraw&quot;: False},\r\n                                  &quot;mode&quot;: &quot;immediate&quot;,\r\n                                  &quot;transition&quot;: {&quot;duration&quot;: 0}}],\r\n                &quot;label&quot;: &quot;Pause&quot;,\r\n                &quot;method&quot;: &quot;animate&quot;\r\n            }\r\n        ],\r\n        &quot;direction&quot;: &quot;left&quot;,\r\n        &quot;pad&quot;: {&quot;r&quot;: 10, &quot;t&quot;: 87},\r\n        &quot;showactive&quot;: False,\r\n        &quot;type&quot;: &quot;buttons&quot;,\r\n        &quot;x&quot;: 0.1,\r\n        &quot;xanchor&quot;: &quot;right&quot;,\r\n        &quot;y&quot;: 0,\r\n        &quot;yanchor&quot;: &quot;top&quot;\r\n    }\r\n]<\/code><\/pre><p>We will also add a slider to filter data within a certain range:<\/p>\n<pre class=\"e2-text-code\"><code>sliders_dict = {\r\n    &quot;active&quot;: 0,\r\n    &quot;yanchor&quot;: &quot;top&quot;,\r\n    &quot;xanchor&quot;: &quot;left&quot;,\r\n    &quot;currentvalue&quot;: {\r\n        &quot;font&quot;: {&quot;size&quot;: 20},\r\n        &quot;prefix&quot;: &quot;Year: &quot;,\r\n        &quot;visible&quot;: True,\r\n        &quot;xanchor&quot;: &quot;right&quot;\r\n    },\r\n    &quot;transition&quot;: {&quot;duration&quot;: 300, &quot;easing&quot;: &quot;cubic-in-out&quot;},\r\n    &quot;pad&quot;: {&quot;b&quot;: 10, &quot;t&quot;: 50},\r\n    &quot;len&quot;: 0.9,\r\n    &quot;x&quot;: 0.1,\r\n    &quot;y&quot;: 0,\r\n    &quot;steps&quot;: []\r\n}<\/code><\/pre><p>By default, the chart will display data for 2005 before we click on the “Play” button.<\/p>\n<pre class=\"e2-text-code\"><code>continents_list_from_df = list(bigmac_df['continent'].unique())\r\nyear = 2005\r\nfor continent in continents_list_from_df:\r\n    dataset_by_year = bigmac_df[bigmac_df[&quot;date&quot;] == year]\r\n    dataset_by_year_and_cont = dataset_by_year[dataset_by_year[&quot;continent&quot;] == continent]\r\n    \r\n    data_dict = {\r\n        &quot;x&quot;: dataset_by_year_and_cont[&quot;population&quot;],\r\n        &quot;y&quot;: dataset_by_year_and_cont[&quot;gdp_per_capita&quot;],\r\n        &quot;mode&quot;: &quot;markers&quot;,\r\n        &quot;text&quot;: dataset_by_year_and_cont[&quot;name&quot;],\r\n        &quot;marker&quot;: {\r\n            &quot;sizemode&quot;: &quot;area&quot;,\r\n            &quot;sizeref&quot;: 200000,\r\n            &quot;size&quot;:  np.array(dataset_by_year_and_cont[&quot;dollar_price&quot;]) * 20000000\r\n        },\r\n        &quot;name&quot;: continent,\r\n        &quot;customdata&quot;: np.array(dataset_by_year_and_cont[&quot;dollar_price&quot;]).round(1),\r\n        &quot;hovertemplate&quot;: '&lt;b&gt;%{text}&lt;\/b&gt;' + '&lt;br&gt;' +\r\n                         'GDP per capita: %{y}' + '&lt;br&gt;' +\r\n                         'Population: %{x}' + '&lt;br&gt;' +\r\n                         'Big Mac price: %{customdata}$' +\r\n                         '&lt;extra&gt;&lt;\/extra&gt;'\r\n    }\r\n    fig_dict[&quot;data&quot;].append(data_dict)<\/code><\/pre><p>Next, we need to fill up the frames field, which will be used for animating the data. Each frame represents a certain data point from 2005 to 2019.<\/p>\n<pre class=\"e2-text-code\"><code>for year in years:\r\n    frame = {&quot;data&quot;: [], &quot;name&quot;: str(year)}\r\n    for continent in continents_list_from_df:\r\n        dataset_by_year = bigmac_df[bigmac_df[&quot;date&quot;] == int(year)]\r\n        dataset_by_year_and_cont = dataset_by_year[dataset_by_year[&quot;continent&quot;] == continent]\r\n\r\n        data_dict = {\r\n            &quot;x&quot;: list(dataset_by_year_and_cont[&quot;population&quot;]),\r\n            &quot;y&quot;: list(dataset_by_year_and_cont[&quot;gdp_per_capita&quot;]),\r\n            &quot;mode&quot;: &quot;markers&quot;,\r\n            &quot;text&quot;: list(dataset_by_year_and_cont[&quot;name&quot;]),\r\n            &quot;marker&quot;: {\r\n                &quot;sizemode&quot;: &quot;area&quot;,\r\n                &quot;sizeref&quot;: 200000,\r\n                &quot;size&quot;: np.array(dataset_by_year_and_cont[&quot;dollar_price&quot;]) * 20000000\r\n            },\r\n            &quot;name&quot;: continent,\r\n            &quot;customdata&quot;: np.array(dataset_by_year_and_cont[&quot;dollar_price&quot;]).round(1),\r\n            &quot;hovertemplate&quot;: '&lt;b&gt;%{text}&lt;\/b&gt;' + '&lt;br&gt;' +\r\n                             'GDP per capita: %{y}' + '&lt;br&gt;' +\r\n                             'Population: %{x}' + '&lt;br&gt;' +\r\n                             'Big Mac price: %{customdata}$' +\r\n                             '&lt;extra&gt;&lt;\/extra&gt;'\r\n        }\r\n        frame[&quot;data&quot;].append(data_dict)\r\n\r\n    fig_dict[&quot;frames&quot;].append(frame)\r\n    slider_step = {&quot;args&quot;: [\r\n        [year],\r\n        {&quot;frame&quot;: {&quot;duration&quot;: 300, &quot;redraw&quot;: False},\r\n         &quot;mode&quot;: &quot;immediate&quot;,\r\n         &quot;transition&quot;: {&quot;duration&quot;: 300}}\r\n    ],\r\n        &quot;label&quot;: year,\r\n        &quot;method&quot;: &quot;animate&quot;}\r\n    sliders_dict[&quot;steps&quot;].append(slider_step)<\/code><\/pre><p>Just a few finishing touches left, instantiate the chart, set colors, fonts and title.<\/p>\n<pre class=\"e2-text-code\"><code>fig_dict[&quot;layout&quot;][&quot;sliders&quot;] = [sliders_dict]\r\n\r\nfig = go.Figure(fig_dict)\r\n\r\nfig.update_layout(\r\n    title = \r\n        {'text':'&lt;b&gt;Motion chart&lt;\/b&gt;&lt;br&gt;&lt;span style=&quot;color:#666666&quot;&gt;The Big Mac index from 2005 to 2019&lt;\/span&gt;'},\r\n    font={\r\n        'family':'Open Sans, light',\r\n        'color':'black',\r\n        'size':14\r\n    },\r\n    plot_bgcolor='rgba(0,0,0,0)'\r\n)\r\nfig.update_yaxes(nticks=4)\r\nfig.update_xaxes(tickfont=dict(family='Open Sans, light', color='black', size=12), nticks=4, gridcolor='lightgray', gridwidth=0.5)\r\nfig.update_yaxes(tickfont=dict(family='Open Sans, light', color='black', size=12), nticks=4, gridcolor='lightgray', gridwidth=0.5)\r\n\r\nfig.show()<\/code><\/pre><p>Bingo! The Motion Chart is done:<\/p>\n<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https:\/\/plotly.com\/~i-bond\/32.embed?showlink=false\" height=\"650\" width=\"100%\"><\/iframe>\n<p><i>View the code on <a href=\"https:\/\/github.com\/valiotti\/leftjoin\/tree\/master\/motion-chart-big-mac\">GitHub<\/a><\/i><\/p>\n",
            "date_published": "2020-10-30T15:35:40+03:00",
            "date_modified": "2020-10-30T15:40:47+03:00",
            "image": "https:\/\/en.leftjoin.ru\/pictures\/remote\/youtube-hVimVzgtD6w-cover.jpg",
            "_date_published_rfc2822": "Fri, 30 Oct 2020 15:35:40 +0300",
            "_rss_guid_is_permalink": "false",
            "_rss_guid": "53",
            "_e2_data": {
                "is_favourite": false,
                "links_required": [
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css"
                ],
                "og_images": [
                    "https:\/\/en.leftjoin.ru\/pictures\/remote\/youtube-hVimVzgtD6w-cover.jpg",
                    "https:\/\/en.leftjoin.ru\/pictures\/1_1.png",
                    "https:\/\/en.leftjoin.ru\/pictures\/2-20.png",
                    "https:\/\/en.leftjoin.ru\/pictures\/3-16.png"
                ]
            }
        },
        {
            "id": "47",
            "url": "https:\/\/en.leftjoin.ru\/all\/how-to-build-a-dashboard-with-bootstrap-4-from-scratch-part-2\/",
            "title": "How to build a dashboard with Bootstrap 4 from scratch (Part 2)",
            "content_html": "<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/1-18.png\" width=\"2000\" height=\"1154\" alt=\"\" \/>\n<\/div>\n<p>Previously we shared <a href=\"https:\/\/en.leftjoin.ru\/all\/how-to-build-dashboard-with-bootstrap-4-from-scratch-part-1\/\">how to use Bootstrap components in building dashboard layout<\/a> and designed a simple yet flexible dashboard with a scatter plot and Russian map. In today’s material, we will continue adding more information, explore how to make Bootstrap tables responsive, and cover some complex callbacks for data acquisition.<\/p>\n<h2>Constructing Data Tables<\/h2>\n<p>All the code for populating our tables with data will be stored in <span class=\"inline-code\">get_tables.py<\/span> , while the layout components areoutlined in <span class=\"inline-code\"> application.py<\/span>.  This article will cover the process of creating the table with top Russian Breweries,  however, you can find the code for creating the other three on Github.<\/p>\n<p>Data in the Top Breweries table can be filtered by city name in the dropdown menu, but the data collected in Untappd is not equally structured. Some city names are written in Latin, others in Cyrillic. So the challenge is to make the names equal for SQL queries, and here is where Google Translate comes to the rescue. Though we sill have to manually create a dictionary of city names,  since for example “Москва” can be written as “Moskva”  and not “Moscow”.  This dictionary will be used later for mapping our DataFrame before transforming it into a Bootstrap table.<\/p>\n<pre class=\"e2-text-code\"><code>import pandas as pd\r\nimport dash_bootstrap_components as dbc\r\nfrom clickhouse_driver import Client\r\nimport numpy as np\r\nfrom googletrans import Translator\r\n\r\ntranslator = Translator()\r\n\r\nclient = Client(host='12.34.56.78', user='default', password='', port='9000', database='')\r\n\r\ncity_names = {\r\n   'Moskva': 'Москва',\r\n   'Moscow': 'Москва',\r\n   'СПБ': 'Санкт-Петербург',\r\n   'Saint Petersburg': 'Санкт-Петербург',\r\n   'St Petersburg': 'Санкт-Петербург',\r\n   'Nizhnij Novgorod': 'Нижний Новгород',\r\n   'Tula': 'Тула',\r\n   'Nizhniy Novgorod': 'Нижний Новгород',\r\n}<\/code><\/pre><h2>Top Breweries Table<\/h2>\n<p>This table displays top 10 Russian breweries and their position change according to the rating. Simply put, we need to compare data for two periods, that’s [30 days ago; today] and [60 days ago; 30 days ago]. With this in mind, we will need the following headers: ranking, brewery name, position change, and number of check-ins.<br \/>\nCreate the  <span class=\"inline-code\">get_top_russian_breweries<\/span> function that would make queries to the Clickhouse DB, sort the data and return a refined Pandas DataFrame. Let’s send the following queries to obtain data for the past 30 and 60 days, ordering the results by the number of check-ins.<\/p>\n<p><details><br \/>\n<summary><span style=\"color:#7ea9b8\">Querying data from the Database<\/span><\/summary><\/p>\n<pre class=\"e2-text-code\"><code>def get_top_russian_breweries(checkins_n=250):\r\n   top_n_brewery_today = client.execute(f'''\r\n      SELECT  rt.brewery_id,\r\n              rt.brewery_name,\r\n              beer_pure_average_mult_count\/count_for_that_brewery as avg_rating,\r\n              count_for_that_brewery as checkins FROM (\r\n      SELECT           \r\n              brewery_id,\r\n              dictGet('breweries', 'brewery_name', toUInt64(brewery_id)) as brewery_name,\r\n              sum(rating_score) AS beer_pure_average_mult_count,\r\n              count(rating_score) AS count_for_that_brewery\r\n          FROM beer_reviews t1\r\n          ANY LEFT JOIN venues AS t2 ON t1.venue_id = t2.venue_id\r\n          WHERE isNotNull(venue_id) AND (created_at &gt;= (today() - 30)) AND (venue_country = 'Россия') \r\n          GROUP BY           \r\n              brewery_id,\r\n              brewery_name) rt\r\n      WHERE (checkins&gt;={checkins_n})\r\n      ORDER BY avg_rating DESC\r\n      LIMIT 10\r\n      '''\r\n   )\r\n\r\ntop_n_brewery_n_days = client.execute(f'''\r\n  SELECT  rt.brewery_id,\r\n          rt.brewery_name,\r\n          beer_pure_average_mult_count\/count_for_that_brewery as avg_rating,\r\n          count_for_that_brewery as checkins FROM (\r\n  SELECT           \r\n          brewery_id,\r\n          dictGet('breweries', 'brewery_name', toUInt64(brewery_id)) as brewery_name,\r\n          sum(rating_score) AS beer_pure_average_mult_count,\r\n          count(rating_score) AS count_for_that_brewery\r\n      FROM beer_reviews t1\r\n      ANY LEFT JOIN venues AS t2 ON t1.venue_id = t2.venue_id\r\n      WHERE isNotNull(venue_id) AND (created_at &gt;= (today() - 60) AND created_at &lt;= (today() - 30)) AND (venue_country = 'Россия')\r\n      GROUP BY           \r\n          brewery_id,\r\n          brewery_name) rt\r\n  WHERE (checkins&gt;={checkins_n})\r\n  ORDER BY avg_rating DESC\r\n  LIMIT 10\r\n  '''\r\n)<\/code><\/pre><p><\/details><\/p>\n<p>Creating two DataFrames with the received data:<\/p>\n<pre class=\"e2-text-code\"><code>top_n = len(top_n_brewery_today)\r\ncolumn_names = ['brewery_id', 'brewery_name', 'avg_rating', 'checkins']\r\n\r\ntop_n_brewery_today_df = pd.DataFrame(top_n_brewery_today, columns=column_names).replace(np.nan, 0)\r\ntop_n_brewery_today_df['brewery_pure_average'] = round(top_n_brewery_today_df.avg_rating, 2)\r\ntop_n_brewery_today_df['brewery_rank'] = list(range(1, top_n + 1))\r\n\r\ntop_n_brewery_n_days = pd.DataFrame(top_n_brewery_n_days, columns=column_names).replace(np.nan, 0)\r\ntop_n_brewery_n_days['brewery_pure_average'] = round(top_n_brewery_n_days.avg_rating, 2)\r\ntop_n_brewery_n_days['brewery_rank'] = list(range(1, len(top_n_brewery_n_days) + 1))<\/code><\/pre><p>And then calculate the position change over the period of time for each brewery received. With the try-except block, we will handle exceptions, in case, if a brewery was not yet in our database 60 days ago.<\/p>\n<pre class=\"e2-text-code\"><code>rank_was_list = []\r\nfor brewery_id in top_n_brewery_today_df.brewery_id:\r\n   try:\r\n       rank_was_list.append(\r\n           top_n_brewery_n_days[top_n_brewery_n_days.brewery_id == brewery_id].brewery_rank.item())\r\n   except ValueError:\r\n       rank_was_list.append('–')\r\ntop_n_brewery_today_df['rank_was'] = rank_was_list<\/code><\/pre><p>Now we iterate over the columns with current and former positions. If there is no hyphen contained in, we will append an up or down arrow depending on the change.<\/p>\n<pre class=\"e2-text-code\"><code>diff_rank_list = []\r\nfor rank_was, rank_now in zip(top_n_brewery_today_df['rank_was'], top_n_brewery_today_df['brewery_rank']):\r\n   if rank_was != '–':\r\n       difference = rank_was - rank_now\r\n       if difference &gt; 0:\r\n           diff_rank_list.append(f'↑ +{difference}')\r\n       elif difference &lt; 0:\r\n           diff_rank_list.append(f'↓ {difference}')\r\n       else:\r\n           diff_rank_list.append('–')\r\n   else:\r\n       diff_rank_list.append(rank_was)<\/code><\/pre><p>Finally,  replace DataFrame headers, inserting the column with current ranking positions, where the top 3 will be displayed with the trophy emoji.<\/p>\n<pre class=\"e2-text-code\"><code>df = top_n_brewery_today_df[['brewery_name', 'avg_rating', 'checkins']].round(2)\r\ndf.insert(2, 'Position change', diff_rank_list)\r\ndf.columns = ['NAME', 'RATING', 'POSITION CHANGE', 'CHECK-INS']\r\ndf.insert(0, 'RANKING', list('🏆 ' + str(i) if i in [1, 2, 3] else str(i) for i in range(1, len(df) + 1)))\r\n\r\nreturn df<\/code><\/pre><h2>Filtering data by city name<\/h2>\n<p>One of the main tasks we set before creating this dashboard was to find out what are the most liked breweries in a certain city. The user chooses a city in the dropdown menu and gets the results. Sound pretty simple, but is it that easy?<br \/>\nOur next step is to write a script that would update data for each city and store it in separate CSV files. As we mentioned earlier, the city names are not equally structured, so we need to use Google Translator within the if-else block, and since it may not convert some names to Cyrillic we need to explicitly specify such cases:<\/p>\n<pre class=\"e2-text-code\"><code>en_city = venue_city\r\nif en_city == 'Nizhnij Novgorod':\r\n      ru_city = 'Нижний Новгород'\r\nelif en_city == 'Perm':\r\n      ru_city = 'Пермь'\r\nelif en_city == 'Sergiev Posad':\r\n      ru_city = 'Сергиев Посад'\r\nelif en_city == 'Vladimir':\r\n      ru_city = 'Владимир'\r\nelif en_city == 'Yaroslavl':\r\n      ru_city = 'Ярославль'\r\nelse:\r\n      ru_city = translator.translate(en_city, dest='ru').text<\/code><\/pre><p>Then we need to add both city names in English and Russian to the SQL query, to receive all check-ins sent from this city.<\/p>\n<pre class=\"e2-text-code\"><code>WHERE (rt.venue_city='{ru_city}' OR rt.venue_city='{en_city}')<\/code><\/pre><p>Finally, we export received data into a CSV file in the following directory –  <span class=\"inline-code\">data\/cities<\/span>.<\/p>\n<pre class=\"e2-text-code\"><code>df = top_n_brewery_today_df[['brewery_name', 'venue_city', 'avg_rating', 'checkins']].round(2)\r\ndf.insert(3, 'Position Change', diff_rank_list)\r\ndf.columns = ['NAME', 'CITY', 'RATING', 'POSITION CHANGE', 'CHECK-INS']\r\n# MAPPING\r\ndf['CITY'] = df['CITY'].map(lambda x: city_names[x] if (x in city_names) else x)\r\n# TRANSLATING\r\ndf['CITY'] = df['CITY'].map(lambda x: translator.translate(x, dest='en').text)\r\ndf.to_csv(f'data\/cities\/{en_city}.csv', index=False)\r\nprint(f'{en_city}.csv updated!')<\/code><\/pre><h2>Scheduling Updates<\/h2>\n<p>We will use the <span class=\"inline-code\">apscheduler<\/span>  library to automatically run the script and refresh data for each city in <span class=\"inline-code\">all_cities<\/span> every day at 10:30 am (UTC).<\/p>\n<pre class=\"e2-text-code\"><code>from apscheduler.schedulers.background import BackgroundScheduler\r\nfrom get_tables import update_best_breweries\r\n\r\nall_cities = sorted(['Vladimir', 'Voronezh', 'Ekaterinburg', 'Kazan', 'Red Pakhra', 'Krasnodar',\r\n             'Kursk', 'Moscow', 'Nizhnij Novgorod', 'Perm', 'Rostov-on-Don', 'Saint Petersburg',\r\n             'Sergiev Posad', 'Tula', 'Yaroslavl'])\r\n\r\nscheduler = BackgroundScheduler()\r\n@scheduler.scheduled_job('cron', hour=10, misfire_grace_time=30)\r\ndef update_data():\r\n   for city in all_cities:\r\n       update_best_breweries(city)\r\nscheduler.start()<\/code><\/pre><h2>Table from DataFrame<\/h2>\n<p><span class=\"inline-code\">get_top_russian_breweries_table(venue_city, checkins_n=250)<\/span>  will accept venue_city and checkins_n generating a Bootstrap Table with the top breweries. The second parameter value,  <span class=\"inline-code\">checkins_n<\/span>  can be changed with the slider. If the city name is not specified, the function will return top Russian breweries table.<\/p>\n<pre class=\"e2-text-code\"><code>if venue_city == None: \r\n      selected_df = get_top_russian_breweries(checkins_n)\r\nelse: \r\n      en_city = venue_city<\/code><\/pre><p>In other case the DataFrame will be constructed from a CSV file stored in <span class=\"inline-code\">data\/cities\/<\/span>. Since the city column still may contain different names we should apply mapping and use a lambda expression with the <span class=\"inline-code\">map()<\/span> method. The lambda function will compare values in the column against keys in <span class=\"inline-code\">city_names<\/span> and if there is a match, the column value will be overwritten.<br \/>\nFor instance,  if <span class=\"inline-code\">df[‘CITY’]<\/span> contains  “СПБ”, a frequent acronym for Saint Petersburg, the value will be replaced, while for “Воронеж” it will remain unchanged.<br \/>\nAnd last but not least, we need to remove all duplicate rows from the table, add a column with a ranking position and return the first 10 rows. These would be the most liked breweries in a selected city.<\/p>\n<pre class=\"e2-text-code\"><code>df = pd.read_csv(f'data\/cities\/{en_city}.csv')     \r\ndf = df.loc[df['CHECK-INS'] &gt;= checkins_n]\r\ndf.drop_duplicates(subset=['NAME', 'CITY'], keep='first', inplace=True)  \r\ndf.insert(0, 'RANKING', list('🏆 ' + str(i) if i in [1, 2, 3] else str(i) for i in range(1, len(df) + 1)))\r\nselected_df = df.head(10)<\/code><\/pre><p>After all DataFrame manipulations, the function returns a simply styled Bootstrap table of top breweries.<\/p>\n<p><details><br \/>\n<summary><span style=\"color:#7ea9b8\">Bootstrap table layout in DBC<\/span><\/summary><\/p>\n<pre class=\"e2-text-code\"><code>table = dbc.Table.from_dataframe(selected_df, striped=False,\r\n                                bordered=False, hover=True,\r\n                                size='sm',\r\n                                style={'background-color': '#ffffff',\r\n                                       'font-family': 'Proxima Nova Regular',\r\n                                       'text-align':'center',\r\n                                       'fontSize': '12px'},\r\n                                className='table borderless'\r\n                                )\r\n\r\nreturn table<\/code><\/pre><h2>Layout structure<\/h2>\n<p>Add a Slider and a Dropdown menu with city names in <span class=\"inline-code\">application.py<\/span><\/p>\n<p class=\"note\">To learn more about the Dashboard layout structure, please refer to <a href=\"https:\/\/en.leftjoin.ru\/all\/how-to-build-dashboard-with-bootstrap-4-from-scratch-part-1\/\">our previous guide<\/a><\/p>\n<pre class=\"e2-text-code\"><code>checkins_slider_tab_1 = dbc.CardBody(\r\n                           dbc.FormGroup(\r\n                               [\r\n                                   html.H6('Number of check-ins', style={'text-align': 'center'})),\r\n                                   dcc.Slider(\r\n                                       id='checkin_n_tab_1',\r\n                                       min=0,\r\n                                       max=250,\r\n                                       step=25,\r\n                                       value=250,  \r\n                                       loading_state={'is_loading': True},\r\n                                       marks={i: i for i in list(range(0, 251, 25))}\r\n                                   ),\r\n                               ],\r\n                           ),\r\n                           style={'max-height': '80px', \r\n                                  'padding-top': '25px'\r\n                                  }\r\n                       )\r\n\r\ntop_breweries = dbc.Card(\r\n       [\r\n           dbc.CardBody(\r\n               [\r\n                   dbc.FormGroup(\r\n                       [\r\n                           html.H6('Filter by city', style={'text-align': 'center'}),\r\n                           dcc.Dropdown(\r\n                               id='city_menu',\r\n                               options=[{'label': i, 'value': i} for i in all_cities],\r\n                               multi=False,\r\n                               placeholder='Select city',\r\n                               style={'font-family': 'Proxima Nova Regular'}\r\n                           ),\r\n                       ],\r\n                   ),\r\n                   html.P(id=&quot;tab-1-content&quot;, className=&quot;card-text&quot;),\r\n               ],\r\n           ),\r\n   ],\r\n)<\/code><\/pre><p><\/details><\/p>\n<p>We’ll also need to add a callback function to update the table by dropdown menu and slider values:<\/p>\n<pre class=\"e2-text-code\"><code>@app.callback(\r\n   Output(&quot;tab-1-content&quot;, &quot;children&quot;), [Input(&quot;city_menu&quot;, &quot;value&quot;),\r\n                                         Input(&quot;checkin_n_tab_1&quot;, &quot;value&quot;)]\r\n)\r\ndef table_content(city, checkin_n):\r\n   return get_top_russian_breweries_table(city, checkin_n)<\/code><\/pre><p>Tada, the main table is ready! The dashboard can be used to receive up-to-date info about best Russian breweries, beers, and its rating across different regions, and help to make a better choice for an enjoyable tasting experience.<\/p>\n<div class=\"e2-text-picture\">\n<a href=\"http:\/\/dashboard-final-en.us-east-2.elasticbeanstalk.com\/\" class=\"e2-text-picture-link\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/2-17.png\" width=\"1234\" height=\"630\" alt=\"\" \/>\n<\/a><\/div>\n<p><i>View the code on <a href=\"https:\/\/github.com\/valiotti\/leftjoin\/tree\/master\/rutappd\">GitHub<\/a><\/i><\/p>\n",
            "date_published": "2020-10-07T16:35:17+03:00",
            "date_modified": "2020-10-29T09:46:53+03:00",
            "image": "https:\/\/en.leftjoin.ru\/pictures\/1-18.png",
            "_date_published_rfc2822": "Wed, 07 Oct 2020 16:35:17 +0300",
            "_rss_guid_is_permalink": "false",
            "_rss_guid": "47",
            "_e2_data": {
                "is_favourite": false,
                "links_required": [
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css"
                ],
                "og_images": [
                    "https:\/\/en.leftjoin.ru\/pictures\/1-18.png",
                    "https:\/\/en.leftjoin.ru\/pictures\/2-17.png"
                ]
            }
        },
        {
            "id": "41",
            "url": "https:\/\/en.leftjoin.ru\/all\/visualizing-covid-19-in-russia-with-plotly\/",
            "title": "VIsualizing COVID-19 in Russia with Plotly",
            "content_html": "<p>Maps are widely used in data visualization, it’s a great tool to display statistics for certain areas, regions, and cities. Before displaying the map we need to encode each region or any other administrative unit. Choropleth map gets divided into polygons and multipolygons with latitude and longitude coordinates. Plotly has a built-in solution for plotting choropleth map for America and Europe regions, however, Russia is not included yet. So we decided to use an existing GeoJSON file to map administrative regions of Russia and display the latest COVID-19 stats with Plotly.<\/p>\n<pre class=\"e2-text-code\"><code>from urllib.request import urlopen\r\nimport json\r\nimport requests\r\nimport pandas as pd\r\nfrom selenium import webdriver\r\nfrom bs4 import BeautifulSoup as bs\r\nimport plotly.graph_objects as go<\/code><\/pre><h2>Modifying GeoJSON<\/h2>\n<p>First, we need to download a public GeoJSON file with the boundaries for the Federal subjects of Russia. The file already contains some information, such as region names, but it’s still doesn’t fit the required format and missing region identifiers.<\/p>\n<pre class=\"e2-text-code\"><code>with urlopen('https:\/\/raw.githubusercontent.com\/codeforamerica\/click_that_hood\/master\/public\/data\/russia.geojson') as response:\r\n    counties = json.load(response)<\/code><\/pre><p>Besides that, there are slight differences in the namings. For example,  Bashkortostan on стопкоронавирус.рф, the site we are going to scrape data from, it’s listed as “The Republic of Bashkortostan”, while in our GeoJSON file it’s simply named “Bashkortostan”. These differences should be eliminated to avoid possible confusion. Also, the names should start with a capital.<\/p>\n<pre class=\"e2-text-code\"><code>regions_republic_1 = ['Бурятия', 'Тыва', 'Адыгея', 'Татарстан', 'Марий Эл',\r\n                      'Чувашия', 'Северная Осетия – Алания', 'Алтай',\r\n                      'Дагестан', 'Ингушетия', 'Башкортостан']\r\nregions_republic_2 = ['Удмуртская республика', 'Кабардино-Балкарская республика',\r\n                      'Карачаево-Черкесская республика', 'Чеченская республика']\r\nfor k in range(len(counties['features'])):\r\n    counties['features'][k]['id'] = k\r\n    if counties['features'][k]['properties']['name'] in regions_republic_1:\r\n        counties['features'][k]['properties']['name'] = 'Республика ' + counties['features'][k]['properties']['name']\r\n    elif counties['features'][k]['properties']['name'] == 'Ханты-Мансийский автономный округ - Югра':\r\n        counties['features'][k]['properties']['name'] = 'Ханты-Мансийский АО'\r\n    elif counties['features'][k]['properties']['name'] in regions_republic_2:\r\n        counties['features'][k]['properties']['name'] = counties['features'][k]['properties']['name'].title()<\/code><\/pre><p>It’s time to create a DataFrame from the resulting GeoJSON file with the regions of Russia, we’ll take the identifiers and names.<\/p>\n<pre class=\"e2-text-code\"><code>region_id_list = []\r\nregions_list = []\r\nfor k in range(len(counties['features'])):\r\n    region_id_list.append(counties['features'][k]['id'])\r\n    regions_list.append(counties['features'][k]['properties']['name'])\r\ndf_regions = pd.DataFrame()\r\ndf_regions['region_id'] = region_id_list\r\ndf_regions['region_name'] = regions_list<\/code><\/pre><p>As a result, our DataFrame looks like the following:<\/p>\n<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/1-13.png\" width=\"313\" height=\"179\" alt=\"\" \/>\n<\/div>\n<h2>Data Scraping<\/h2>\n<p>We need to scrape the data stored in this table:<\/p>\n<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/2-13.png\" width=\"1160\" height=\"280\" alt=\"\" \/>\n<\/div>\n<p>Let’s use the Selenium library for this task. We need to navigate to the webpage and convert it into a BeautifulSoup object<\/p>\n<pre class=\"e2-text-code\"><code>driver = webdriver.Chrome()\r\ndriver.get('https:\/\/стопкоронавирус.рф\/information\/')\r\nsource_data = driver.page_source\r\nsoup = bs(source_data, 'lxml')<\/code><\/pre><p>The region names are wrapped with &lt;th&gt; tags, while the latest data is stored in table cells, each one is defined with a &lt;td&gt; tag.<\/p>\n<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/3-12.png\" width=\"458\" height=\"120\" alt=\"\" \/>\n<\/div>\n<pre class=\"e2-text-code\"><code>divs_data = soup.find_all('td')<\/code><\/pre><p>The divs_data  list should return something like this:<\/p>\n<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/4-10.png\" width=\"816\" height=\"312\" alt=\"\" \/>\n<\/div>\n<p>The data is grouped in one line, this includes both new cases and active ones. It is noticeable that each region corresponds to five values, for Moscow these are the first five, for Moscow Region the next five and so on. We can use this pattern to create five lists and populate with values according to the index. The first value will be appended to the list with active cases, the second value to the list of new ones, etc. After every five values, the index will be reset to zero.<\/p>\n<pre class=\"e2-text-code\"><code>count = 1\r\nfor td in divs_data:\r\n    if count == 1:\r\n        sick_list.append(int(td.text))\r\n    elif count == 2:\r\n        new_list.append(int(td.text))\r\n    elif count == 3:\r\n        cases_list.append(int(td.text))\r\n    elif count == 4:\r\n        healed_list.append(int(td.text))\r\n    elif count == 5:\r\n        died_list.append(int(td.text))\r\n        count = 0\r\n    count += 1<\/code><\/pre><p>The next step is to extract the region names from the table, they are stored within the col-region class. We also need to clean up the data by eliminating extra white spaces and line breaks.<\/p>\n<pre class=\"e2-text-code\"><code>divs_region_names = soup.find_all('th', {'class':'col-region'})\r\nregion_names_list = []\r\nfor i in range(1, len(divs_region_names)):\r\n    region_name = divs_region_names[i].text\r\n    region_name = region_name.replace('\\n', '').replace('  ', '')\r\n    region_names_list.append(region_name)<\/code><\/pre><p>Create a DataFrame:<\/p>\n<pre class=\"e2-text-code\"><code>df = pd.DataFrame()\r\ndf['region_name'] = region_names_list\r\ndf['sick'] = sick_list\r\ndf['new'] = new_list\r\ndf['cases'] = cases_list\r\ndf['healed'] = healed_list\r\ndf['died'] = died_list<\/code><\/pre><p>After reviewing our data once again we detected white space under the index 10. This should be fixed immediately, otherwise, we may run into problems.<\/p>\n<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/5-8.png\" width=\"310\" height=\"128\" alt=\"\" \/>\n<\/div>\n<pre class=\"e2-text-code\"><code>df.loc[10, 'region_name'] = df[df.region_name == 'Челябинская область '].region_name.item().strip(' ')<\/code><\/pre><p>Finally,  we can merge our DataFrame on the region_name column, so that the resulted table will include a column with region id, which is required to make a choropleth map.<\/p>\n<pre class=\"e2-text-code\"><code>df = df.merge(df_regions, on='region_name')<\/code><\/pre><h2>Creating a choropleth map with Plotly<\/h2>\n<p>Let’s create a new figure and pass a choroplethmapbox object to it. The geojson parameter will accept the counties variable with the GeoJSON file, assign the region_id to locations. The z parameter represents the data to be color-coded, in this example we’re passing the number of new cases for each region. Assign the region names to text. The colorscale parameter accepts lists with values ranging from 0 to 1 and RGB color codes. Here, the palette changes from green to yellow and then red, depending on the number of active cases. By passing the values stored in customdata we can change our hovertemplate.<\/p>\n<pre class=\"e2-text-code\"><code>fig = go.Figure(go.Choroplethmapbox(geojson=counties,\r\n                           locations=df['region_id'],\r\n                           z=df['new'],\r\n                           text=df['region_name'],\r\n                           colorscale=[[0, 'rgb(34, 150, 79)'],\r\n                                       [0.2, 'rgb(249, 247, 174)'],\r\n                                       [0.8, 'rgb(253, 172, 99)'],\r\n                                       [1, 'rgb(212, 50, 44)']],\r\n                           colorbar_thickness=20,\r\n                           customdata=np.stack([df['cases'], df['died'], df['sick'], df['healed']], axis=-1),\r\n                           hovertemplate='&lt;b&gt;%{text}&lt;\/b&gt;'+ '&lt;br&gt;' +\r\n                                         'New cases: %{z}' + '&lt;br&gt;' +\r\n                                         'Active cases: %{customdata[0]}' + '&lt;br&gt;' +\r\n                                         'Deaths: %{customdata[1]}' + '&lt;br&gt;' +\r\n                                         'Total cases: %{customdata[2]}' + '&lt;br&gt;' +\r\n                                         'Recovered: %{customdata[3]}' +\r\n                                         '&lt;extra&gt;&lt;\/extra&gt;',\r\n                           hoverinfo='text, z'))<\/code><\/pre><p>Let’s customize the map, we will use a ready-to-go neutral template, called carto-positron. Set the parameters and display the map:<br \/>\nmapbox_zoom: responsible for zooming;<br \/>\nmapbox_center: centers the map;<br \/>\nmarker_line_width: border width (we removed the borders by setting this parameter to 0);<br \/>\nmargin: usually accepts 0 values to make the map wider.<\/p>\n<pre class=\"e2-text-code\"><code>fig.update_layout(mapbox_style=&quot;carto-positron&quot;,\r\n                  mapbox_zoom=1, mapbox_center = {&quot;lat&quot;: 66, &quot;lon&quot;: 94})\r\nfig.update_traces(marker_line_width=0)\r\nfig.update_layout(margin={&quot;r&quot;:0,&quot;t&quot;:0,&quot;l&quot;:0,&quot;b&quot;:0})\r\nfig.show()<\/code><\/pre><p>And here is our map. According to the plot, we can say that the highest number of cases per day is happening in Moscow – 608 new cases. It’s really high compared to the other regions, and especially to Nenets Autonomous Okrug, where this number is surprisingly low.<\/p>\n<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\"seamless=\"seamless\" src=\"http:\/\/map-env.eba-qra4m2aq.us-east-2.elasticbeanstalk.com\/\"  height=\"500\" width=\"800\"><\/iframe>\n<p><a href=\"https:\/\/github.com\/valiotti\/leftjoin\/tree\/master\/plotly_russian_map\">View the code on GitHub<\/a><\/p>\n",
            "date_published": "2020-08-13T09:30:00+03:00",
            "date_modified": "2020-08-13T09:36:55+03:00",
            "image": "https:\/\/en.leftjoin.ru\/pictures\/1-13.png",
            "_date_published_rfc2822": "Thu, 13 Aug 2020 09:30:00 +0300",
            "_rss_guid_is_permalink": "false",
            "_rss_guid": "41",
            "_e2_data": {
                "is_favourite": false,
                "links_required": [
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css"
                ],
                "og_images": [
                    "https:\/\/en.leftjoin.ru\/pictures\/1-13.png",
                    "https:\/\/en.leftjoin.ru\/pictures\/2-13.png",
                    "https:\/\/en.leftjoin.ru\/pictures\/3-12.png",
                    "https:\/\/en.leftjoin.ru\/pictures\/4-10.png",
                    "https:\/\/en.leftjoin.ru\/pictures\/5-8.png"
                ]
            }
        },
        {
            "id": "38",
            "url": "https:\/\/en.leftjoin.ru\/all\/building-a-plotly-dashboard-with-dynamic-sliders-in-python\/",
            "title": "Building a Plotly Dashboard with dynamic sliders in Python",
            "content_html": "<p>Recently we discussed how to use Plotly and built a scatter plot to display the ratio between the number of reviews and the average rating for Russian Breweries registered on Untappd. Each marker on the plot has two properties, the registration period and the beer range. And today we are going to introduce you to Dash, a Python framework for building analytical web applications. First, create a new file name app.py with a get_scatter_plot(n_days, top_n) function from the previous article.<\/p>\n<pre class=\"e2-text-code\"><code>import dash\r\nimport dash_core_components as dcc\r\nimport dash_html_components as html\r\nfrom get_plots import get_scatter_plot<\/code><\/pre><p>After importing  the necessary libraries we need to load CSS styles and initiate our web app:<\/p>\n<pre class=\"e2-text-code\"><code>external_stylesheets = ['https:\/\/codepen.io\/chriddyp\/pen\/bWLwgP.css']\r\napp = dash.Dash(__name__, external_stylesheets=external_stylesheets)<\/code><\/pre><p>Create a dashboard structure:<\/p>\n<pre class=\"e2-text-code\"><code>app.layout = html.Div(children=[\r\n       html.Div([\r\n           dcc.Graph(id='fig1'),\r\n       ]) ,\r\n       html.Div([\r\n           html.H6('Time period (days)'),\r\n           dcc.Slider(\r\n               id='slider-day1',\r\n               min=0,\r\n               max=100,\r\n               step=1,\r\n               value=30,\r\n               marks={i: str(i) for i in range(0, 100, 10)}\r\n           ),\r\n           html.H6('Number of breweries from the top'),\r\n           dcc.Slider(\r\n               id='slider-top1',\r\n               min=0,\r\n               max=500,\r\n               step=50,\r\n               value=500,\r\n               marks={i: str(i) for i in range(0, 500, 50)})\r\n       ])\r\n])<\/code><\/pre><p>Now we have a plot and two sliders, each with its id and parameters: minimum value, maximum value, step, and initial value. Since the sliders data will be displayed in the plot we need to create a callback. Output is the first argument that displays our plot, the following Input parameters accept values on which the plot depends.<\/p>\n<pre class=\"e2-text-code\"><code>@app.callback(\r\n   dash.dependencies.Output('fig1', 'figure'),\r\n   [dash.dependencies.Input('slider-day1', 'value'),\r\n    dash.dependencies.Input('slider-top1', 'value')])\r\ndef output_fig(n_days, top_n):\r\n    get_scatter_plot(n_days, top_n)<\/code><\/pre><p>At the end of our script we will add the following line to run our code :<\/p>\n<pre class=\"e2-text-code\"><code>if __name__ == '__main__':\r\n   app.run_server(debug=True)<\/code><\/pre><p>Now, whenever the script is running our local IP address will be displayed in the terminal. Let’s open it in a web browser to view our interactive dashboard, it’s updated automatically when moving the sliders.<\/p>\n<div class=\"embed-responsive embed-responsive-4by3\" style=\"min-width:800\"><iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\"seamless=\"seamless\" src=\"http:\/\/dasheng-env.eba-ueep9ck7.us-east-2.elasticbeanstalk.com\" height=\"1100\" width=\"800\" ><\/iframe>\n<\/div>",
            "date_published": "2020-08-03T09:05:31+03:00",
            "date_modified": "2020-08-07T14:43:05+03:00",
            "_date_published_rfc2822": "Mon, 03 Aug 2020 09:05:31 +0300",
            "_rss_guid_is_permalink": "false",
            "_rss_guid": "38",
            "_e2_data": {
                "is_favourite": false,
                "links_required": [
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css"
                ],
                "og_images": []
            }
        },
        {
            "id": "37",
            "url": "https:\/\/en.leftjoin.ru\/all\/building-a-scatter-plot-for-untappd-breweries\/",
            "title": "Building a scatter plot for Untappd Breweries",
            "content_html": "<p>Today we are going to build a scatter plot for Russian Breweries that would display the ratio between the number of reviews and their average ratings for the past 30 days. Data will be taken from check-ins left by Untappd users who rated beers. To make a plot we need markers with specified color and size. The color will depend on a brewery registration date, thus displaying it’s registration period on Untappd, while the size of a marker correlates with the range of beers represented. This article is the first part of our series dedicated to building dashboards with Plotly.<\/p>\n<h2>Writing a Clickhouse query<\/h2>\n<p>First, we need to process the data before using it in our dashboard. Here, we are using public data collected from Untappd. You can find more about this in our previous articles: <a href=\"https:\/\/www.valiotti.com\/leftjoin\/all\/handling-website-buttons-in-selenium\/\" class=\"nu\">“<u>Handling website buttons in Selenium<\/u>”<\/a> and <a href=\"https:\/\/www.valiotti.com\/leftjoin\/all\/example-of-using-dictionaries-in-clickhouse-with-untappd\/\" class=\"nu\">“<u>Example of using dictionaries in Clickhouse with Untappd<\/u>”<\/a>.<\/p>\n<pre class=\"e2-text-code\"><code>from datetime import datetime, timedelta\r\nfrom clickhouse_driver import Client\r\nimport plotly.graph_objects as go\r\nimport pandas as pd\r\nimport numpy as np\r\nclient = Client(host='ec1-2-34-567-89.us-east-2.compute.amazonaws.com', user='default', password='', port='9000', database='default')<\/code><\/pre><p>Our scatter plot will depend on the  <span class=\"inline-code\">get_scatter_plot(n_days, top_n)<\/span> function, which takes two arguments denoting a time span and a number of breweries to display. Let’s write a SQL query to calculate the Brewery Pure Average. It can be presented <a href=\"https:\/\/help.untappd.com\/hc\/en-us\/articles\/360034136372-How-are-ratings-determined-on-Untappd-\">the following<\/a>: multiply the beer rating by the total number of ratings and divide it by the number of brewery reviews. We will also pass a brewery name and its beer range to the query, these parameters can be fetched from our dictionary using the  <span class=\"inline-code\">dictGet<\/span> function. We are only interested in those breweries that have Brewery Pure Average > 0 and the number of reviews > 100.<\/p>\n<pre class=\"e2-text-code\"><code>brewery_pure_average = client.execute(f&quot;&quot;&quot;\r\nSELECT\r\n       t1.brewery_id,\r\n       sum(t1.beer_pure_average_mult_count \/ t2.count_for_that_brewery) AS brewery_pure_average,\r\n       t2.count_for_that_brewery,\r\n       dictGet('breweries', 'brewery_name', toUInt64(t1.brewery_id)),\r\n       dictGet('breweries', 'beer_count', toUInt64(t1.brewery_id)),\r\n       t3.stats_age_on_service \/ 365\r\n   FROM\r\n   (\r\n       SELECT\r\n           beer_id,\r\n           brewery_id,\r\n           sum(rating_score) AS beer_pure_average_mult_count\r\n       FROM beer_reviews\r\n       WHERE created_at &gt;= today()-{n_days}\r\n       GROUP BY\r\n           beer_id,\r\n           brewery_id\r\n   ) AS t1\r\n   ANY LEFT JOIN\r\n   (\r\n       SELECT\r\n           brewery_id,\r\n           count(rating_score) AS count_for_that_brewery\r\n       FROM beer_reviews\r\n       WHERE created_at &gt;= today()-{n_days}\r\n       GROUP BY brewery_id\r\n   ) AS t2 ON t1.brewery_id = t2.brewery_id\r\n   ANY LEFT JOIN\r\n   (\r\n       SELECT\r\n           brewery_id,\r\n           stats_age_on_service\r\n       FROM brewery_info\r\n   ) AS t3 ON t1.brewery_id = t3.brewery_id\r\n   GROUP BY\r\n       t1.brewery_id,\r\n       t2.count_for_that_brewery,\r\n       t3.stats_age_on_service\r\n   HAVING t2.count_for_that_brewery &gt;= 150\r\n   ORDER BY brewery_pure_average\r\n   LIMIT {top_n}\r\n    &quot;&quot;&quot;)\r\n\r\nscatter_plot_df_with_age = pd.DataFrame(brewery_pure_average, columns=['brewery_id', 'brewery_pure_average', 'rating_count', 'brewery_name', 'beer_count'])<\/code><\/pre><h2>Working with a DataFrame<\/h2>\n<p>Add two dotted lines that will pass through the median values of each axis. That way we can find out which breweries are above average, the best ones will be in the upper right area.<\/p>\n<pre class=\"e2-text-code\"><code>dict_list = []\r\ndict_list.append(dict(type=&quot;line&quot;,\r\n                     line=dict(\r\n                         color=&quot;#666666&quot;,\r\n                         dash=&quot;dot&quot;),\r\n                     x0=0,\r\n                     y0=np.median(scatter_plot_df_with_age.brewery_pure_average),\r\n                     x1=7000,\r\n                     y1=np.median(scatter_plot_df_with_age.brewery_pure_average),\r\n                     line_width=1,\r\n                     layer=&quot;below&quot;))\r\ndict_list.append(dict(type=&quot;line&quot;,\r\n                     line=dict(\r\n                         color=&quot;#666666&quot;,\r\n                         dash=&quot;dot&quot;),\r\n                     x0=np.median(scatter_plot_df_with_age.rating_count),\r\n                     y0=0,\r\n                     x1=np.median(scatter_plot_df_with_age.rating_count),\r\n                     y1=5,\r\n                     line_width=1,\r\n                     layer=&quot;below&quot;))<\/code><\/pre><p>Add annotations to display median values by hovering:<\/p>\n<pre class=\"e2-text-code\"><code>annotations_list = []\r\nannotations_list.append(\r\n    dict(\r\n        x=8000,\r\n        y=np.median(scatter_plot_df_with_age.brewery_pure_average) - 0.1,\r\n        xref=&quot;x&quot;,\r\n        yref=&quot;y&quot;,\r\n        text=f&quot;Median value: {round(np.median(scatter_plot_df_with_age.brewery_pure_average), 2)}&quot;,\r\n        showarrow=False,\r\n        font={\r\n            'family':'Roboto, light',\r\n            'color':'#666666',\r\n            'size':12\r\n        }\r\n    )\r\n)\r\nannotations_list.append(\r\n    dict(\r\n        x=np.median(scatter_plot_df_with_age.rating_count) + 180,\r\n        y=0.8,\r\n        xref=&quot;x&quot;,\r\n        yref=&quot;y&quot;,\r\n        text=f&quot;Median value: {round(np.median(scatter_plot_df_with_age.rating_count), 2)}&quot;,\r\n        showarrow=False,\r\n        font={\r\n            'family':'Roboto, light',\r\n            'color':'#666666',\r\n            'size':12\r\n        },\r\n        textangle=-90\r\n    )\r\n)<\/code><\/pre><p>Let’s make our plot more informative by splitting breweries into 4 groups according to the beer range. The first group will include breweries with less than 10 brands, the second group for those holding 10-30 brands, the third one for 30-50 brands, and the last one for large breweries with >50 brands. We stored marker sizes in the <span class=\"inline-code\">bucket_beer_count<\/span> list.<\/p>\n<pre class=\"e2-text-code\"><code>bucket_beer_count = []\r\nfor beer_count in scatter_plot_df_with_age.beer_count:\r\n   if beer_count &lt; 10:\r\n       bucket_beer_count.append(7)\r\n   elif 10 &lt;= beer_count &lt;= 30:\r\n       bucket_beer_count.append(9)\r\n   elif 31 &lt;= beer_count &lt;= 50:\r\n       bucket_beer_count.append(11)\r\n   else:\r\n       bucket_beer_count.append(13)\r\nscatter_plot_df_with_age['bucket_beer_count'] = bucket_beer_count<\/code><\/pre><p>Next step is to perform age-based splitting<\/p>\n<pre class=\"e2-text-code\"><code>bucket_age = []\r\nfor age in scatter_plot_df_with_age.age_on_service:\r\n   if age &lt; 4:\r\n       bucket_age.append(0)\r\n   elif 4 &lt;= age &lt;= 6:\r\n       bucket_age.append(1)\r\n   elif 6 &lt; age &lt; 8:\r\n       bucket_age.append(2)\r\n   else:\r\n       bucket_age.append(3)\r\nscatter_plot_df_with_age['bucket_age'] = bucket_age<\/code><\/pre><p>Let’s divide our DataFrame into 4 parts to build separate scatter plots with its own color and size.<\/p>\n<pre class=\"e2-text-code\"><code>scatter_plot_df_0 = scatter_plot_df[scatter_plot_df.bucket == 0]\r\nscatter_plot_df_1 = scatter_plot_df[scatter_plot_df.bucket == 1]\r\nscatter_plot_df_2 = scatter_plot_df[scatter_plot_df.bucket == 2]\r\nscatter_plot_df_3 = scatter_plot_df[scatter_plot_df.bucket == 3]<\/code><\/pre><h2>Plotting<\/h2>\n<p>Now we are ready to build the plot, add our 4 brewery groups one by one, setting its key parameters: name, marker color, annotation transparency and text.<\/p>\n<pre class=\"e2-text-code\"><code>fig = go.Figure()\r\nfig.add_trace(go.Scatter(\r\n    x=scatter_plot_df_0.rating_count,\r\n    y=scatter_plot_df_0.brewery_pure_average,\r\n    name='&lt; 4',\r\n    mode='markers',\r\n    opacity=0.85,\r\n    text=scatter_plot_df_0.name_count,\r\n    marker_color='rgb(114, 183, 178)',\r\n    marker_size=scatter_plot_df_0.bucket_beer_count,\r\n    textfont={&quot;family&quot;:&quot;Roboto, light&quot;,\r\n              &quot;color&quot;:&quot;black&quot;\r\n             }\r\n))\r\n\r\nfig.add_trace(go.Scatter(\r\n    x=scatter_plot_df_1.rating_count,\r\n    y=scatter_plot_df_1.brewery_pure_average,\r\n    name='4 – 6',\r\n    mode='markers',\r\n    opacity=0.85,\r\n    marker_color='rgb(76, 120, 168)',\r\n    text=scatter_plot_df_1.name_count,\r\n    marker_size=scatter_plot_df_1.bucket_beer_count,\r\n    textfont={&quot;family&quot;:&quot;Roboto, light&quot;,\r\n              &quot;color&quot;:&quot;black&quot;\r\n             }\r\n))\r\n\r\nfig.add_trace(go.Scatter(\r\n    x=scatter_plot_df_2.rating_count,\r\n    y=scatter_plot_df_2.brewery_pure_average,\r\n    name='6 – 8',\r\n    mode='markers',\r\n    opacity=0.85,\r\n    marker_color='rgb(245, 133, 23)',\r\n    text=scatter_plot_df_2.name_count,\r\n    marker_size=scatter_plot_df_2.bucket_beer_count,\r\n    textfont={&quot;family&quot;:&quot;Roboto, light&quot;,\r\n              &quot;color&quot;:&quot;black&quot;\r\n             }\r\n))\r\n\r\nfig.add_trace(go.Scatter(\r\n    x=scatter_plot_df_3.rating_count,\r\n    y=scatter_plot_df_3.brewery_pure_average,\r\n    name='8+',\r\n    mode='markers',\r\n    opacity=0.85,\r\n    marker_color='rgb(228, 87, 86)',\r\n    text=scatter_plot_df_3.name_count,\r\n    marker_size=scatter_plot_df_3.bucket_beer_count,\r\n    textfont={&quot;family&quot;:&quot;Roboto, light&quot;,\r\n              &quot;color&quot;:&quot;black&quot;\r\n             }\r\n))\r\n\r\nfig.update_layout(\r\n    title=f&quot;The ratio between the number of reviews and the average brewery rating for the past &lt;br&gt; {n_days} days, top {top_n} breweries&quot;,\r\n    font={\r\n            'family':'Roboto, light',\r\n            'color':'black',\r\n            'size':14\r\n        },\r\n    plot_bgcolor='rgba(0,0,0,0)',\r\n    yaxis_title=&quot;Average rating&quot;,\r\n    xaxis_title=&quot;Number of reviews&quot;,\r\n    legend_title_text='Registration period&lt;br&gt; on Untappd in years:',\r\n    height=750,\r\n    shapes=dict_list,\r\n    annotations=annotations_list\r\n)<\/code><\/pre><p>Voila, the scatter plot is done! Each point is a separate brewery. The color shows the brewery beer range and when hovering we will see a summary including the average rating for the past 30 days, number of reviews, brewery name, and beer range. The dotted lines are passing through the median values we calculated with NumPy, they’re showing us the best breweries in the upper right. In our next article, we are going to create a breweries dashboard with dynamic parameters.<\/p>\n<div class=\"embed-responsive embed-responsive-4by3\" style=\"min-width:500\"><iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\"seamless=\"seamless\" src=\"\/\/plotly.com\/~i-bond\/20.embed?showlink=false\" height=\"800\" width=\"900\"><\/iframe>\n<\/div>",
            "date_published": "2020-07-15T15:32:22+03:00",
            "date_modified": "2020-07-15T15:37:39+03:00",
            "_date_published_rfc2822": "Wed, 15 Jul 2020 15:32:22 +0300",
            "_rss_guid_is_permalink": "false",
            "_rss_guid": "37",
            "_e2_data": {
                "is_favourite": false,
                "links_required": [
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css"
                ],
                "og_images": []
            }
        },
        {
            "id": "35",
            "url": "https:\/\/en.leftjoin.ru\/all\/sentiment-analysis-of-russians-on-constitutional-amendments\/",
            "title": "Sentiment analysis of Russians on Constitutional Amendments",
            "content_html": "<p>In today’s article, we are going to use public data from vk.com to interpret and classify users’ attitudes about the 2020 amendments to the Constitution of Russia.<\/p>\n<h2>API Overview<\/h2>\n<p>First off, we need to receive data using the <a href=\"https:\/\/vk.com\/dev\/newsfeed.search\">newsfeed.search<\/a> method, this method allows us to get up to one thousand of the latest posts from the news feed by keyword.<br \/>\nThe response data contains different fields, like post ids,  user or community ids, text data, likes count, comments,  apps, geolocation, and many more. We are only needed ids and text data.<br \/>\nSome expanded information about the author will also be useful for our analysis, this includes city, gender, age, and can be received with the <a href=\"https:\/\/vk.com\/dev\/users.get\">users.get<\/a> method.<\/p>\n<h2>Create Clickhouse Tables<\/h2>\n<p>The received data should be stored somewhere, we chose to use ClickHouse, an open-source column-oriented DBMS. Let’s create two tables to store users and their posts. The first table will be populated with ids and text data, the second one will hold user data, such as their ids, age, and city. The ReplacingMergeTree () engine will remove duplicates in our tables.<\/p>\n<p class=\"note\">The article assumes that you’re familiar with how to <a href=\"https:\/\/www.valiotti.com\/leftjoin\/all\/installing-clickhouse-on-aws\/\">install ClickHouse on AWS<\/a>, <a href=\"https:\/\/www.valiotti.com\/leftjoin\/all\/example-of-using-dictionaries-in-clickhouse-with-untappd\/\">create external dictionaries<\/a> and <a href=\"https:\/\/www.valiotti.com\/leftjoin\/all\/working-with-materialized-views-in-clickhouse\/\"> materialized views<\/a><\/p>\n<pre class=\"e2-text-code\"><code>CREATE TABLE vk_posts(\r\n   post_id UInt64,\r\n   post_date DateTime,\r\n   owner_id UInt64,\r\n   from_id UInt64,\r\n   text String\r\n) ENGINE ReplacingMergeTree()\r\nORDER BY post_date\r\n\r\nCREATE TABLE vk_users(\r\n   user_id UInt64,\r\n   user_sex Nullable(UInt8),\r\n   user_city String,\r\n   user_age Nullable(UInt16)\r\n) ENGINE ReplacingMergeTree()\r\nORDER BY user_id<\/code><\/pre><h2>Collecting user posts with the VK API<\/h2>\n<p>Let’s get to writing our script, import the libraries, and create several variables with constant values:<\/p>\n<p class=\"note\">If you don’t have an access token yet and want to create one, refer to this step by step guide:<a href=\"https:\/\/www.valiotti.com\/leftjoin\/all\/collecting-data-on-ad-campaigns-from-vkontakte\/\"> “Collecting Data on Ad Campaigns from VK.com”<\/a><\/p>\n<pre class=\"e2-text-code\"><code>from clickhouse_driver import Client\r\nfrom datetime import datetime\r\nimport requests\r\nimport pandas as pd\r\nimport time\r\n\r\ntoken = 'your_token'\r\nversion = 5.103\r\nclient = Client(host='ec1-23-456-789-1011.us-east-2.compute.amazonaws.com', user='default', password='', port='9000', database='default')      \r\ndata_list = []\r\nstart_from = 0\r\nquery_string = 'конституция' #constitution<\/code><\/pre><p>Define the get_and_insert_info_by_user function that will receive a list of user ids and expanded information about them, and send it to the vk_users table. Since the user_ids parameter takes a list as a string object, we need to change the structure and omit the square brackets.<br \/>\nMost users prefer to conceal their gender, age, and city. In such cases, we need to use Nullable values. To obtain user age we need to subtract the birth year from the current year, if the birth year is missing we can check it using the regular expression.<\/p>\n<p><details><br \/>\n<summary><span style=\"color:#7ea9b8\">get_and_insert_info_by_user() function<\/span><\/summary><\/p>\n<pre class=\"e2-text-code\"><code>def get_and_insert_info_by_user(users):\r\n    try:\r\n        r = requests.get('https:\/\/api.vk.com\/method\/users.get', params={\r\n            'access_token':token,\r\n            'v':version,\r\n            'user_ids':str(users)[1:-2],\r\n            'fields':'sex, city, bdate'\r\n        }).json()['response']\r\n        for user in r:\r\n            user_list = []\r\n            user_list.append(user['id'])\r\n            if client.execute(f&quot;SELECT count(1) FROM vk_users where user_id={user['id']}&quot;)[0][0] == 0:\r\n                print(user['id'])\r\n                try:\r\n                    user_list.append(user['sex'])\r\n                except Exception:\r\n                    user_list.append('cast(Null as Nullable(UInt8))')\r\n                try:\r\n                    user_list.append(user['city']['title'])\r\n                except Exception:\r\n                    user_list.append('')\r\n                try:\r\n                    now = datetime.now()\r\n    \t\t\t    year = item.split('.')[-1]\r\n    \t\t\t    if re.match(r'\\d\\d\\d\\d', year):\r\n        \t\t        age = now.year - int(year)\r\n\t\t\t    \t   user_list.append(age)\r\n                except Exception:\r\n                    user_list.append('cast(Null as Nullable(UInt16))')\r\n                user_insert_tuple = tuple(user_list)\r\n                client.execute(f'INSERT INTO vk_users VALUES {user_insert_tuple}')\r\n    except KeyError:\r\n        pass<\/code><\/pre><p><\/details><br \/>\nOur script will work in a while loop to constantly update data, as we can only receive a thousand of the latest data points.The newsfeed.search method returns 200 posts per call, so we need to invoke it five times to collect all the posts.<\/p>\n<p><details><br \/>\n<summary><span style=\"color:#7ea9b8\">While loop to collect new posts<\/span><\/summary><\/p>\n<pre class=\"e2-text-code\"><code>while True:\r\n    for i in range(5):\r\n        r = requests.get('https:\/\/api.vk.com\/method\/newsfeed.search', params={\r\n            'access_token':token,\r\n            'v':version,\r\n            'q':query_string,\r\n            'count':200,\r\n            'start_from': start_from\r\n        })\r\n        data_list.append(r.json()['response'])\r\n        try:\r\n            start_from = r.json()['response']['next_from']\r\n        except KeyError:\r\n            pass<\/code><\/pre><p><\/details><\/p>\n<p>The data we received can be parsed, VK users always have a positive id, while for communities it’s negative. We need only users data for our analysis, where from_id  > 0. The next step is to check whether a post contains any text data or not. Finally, we will collect and store unique entries by user id. Pause the script after each iteration for 180 seconds to wait for new user posts and not violate the VK API rules.<\/p>\n<p><details><br \/>\n<summary><span style=\"color:#7ea9b8\">Adding new data to Clickhouse<\/span><\/summary><\/p>\n<pre class=\"e2-text-code\"><code>user_ids = []\r\n    for data in data_list:\r\n        for data_item in data['items']:\r\n            if data_item['from_id'] &gt; 0:\r\n                post_list = []\r\n                if not data_item['text']:\r\n                    continue\r\n                if client.execute(f&quot;SELECT count(1) FROM vk_posts WHERE post_id={data_item['id']} AND from_id={data_item['from_id']}&quot;)[0][0] == 0:\r\n                    user_ids.append(data_item['from_id'])\r\n                    date = datetime.fromtimestamp(data_item['date'])\r\n                    date = datetime.strftime(date, '%Y-%m-%d %H:%M:%S')\r\n                    post_list.append(date)\r\n                    post_list.append(data_item['id'])\r\n                    post_list.append(data_item['owner_id'])\r\n                    post_list.append(data_item['from_id'])\r\npost_list.append(data_item['text'].replace(&quot;'&quot;,&quot;&quot;).replace('&quot;','').replace(&quot;\\n&quot;,&quot;&quot;))\r\n                    post_list.append(query_string)\r\n                    post_tuple = tuple(post_list)\r\n                    print(post_list)\r\n                    try:\r\n                        client.execute(f'INSERT INTO vk_posts VALUES {post_tuple}')\r\n                    except Exception as E:\r\n                        print('!!!!! try to insert into vk_post but got', E)\r\n    try:\r\n        get_and_insert_info_by_user(user_ids)\r\n    except Exception as E:\r\n        print(&quot;Try to insert user list:&quot;, user_ids, &quot;but got:&quot;, E)\r\n    time.sleep(180)<\/code><\/pre><p><\/details><\/p>\n<h2>Dostoevsky for sentiment analysis<\/h2>\n<p>For one week our script collected almost 20000 posts from VK users that mention the keyword  “constitution” (or “конституция” in Russian). It’s time to write our second script for data analysis and visualization. First, create a DataFrame with the data received, and evaluate the sentiment of each post, identifying whether it’s positive, negative, or neutral. We are going to use the <a href=\"https:\/\/pypi.org\/project\/dostoevsky\/\">Dostoevsky<\/a> library to analyze the emotion behind a text.<\/p>\n<pre class=\"e2-text-code\"><code>from dostoevsky.tokenization import RegexTokenizer\r\nfrom dostoevsky.models import FastTextSocialNetworkModel\r\nfrom clickhouse_driver import Client\r\nimport pandas as pd\r\nclient = Client(host='ec1-23-456-789-1011.us-east-2.compute.amazonaws.com', user='default', password='', port='9000', database='default')<\/code><\/pre><p>Assign all the contents of our table to the vk_posts variable with a simple query. Iterate through all the posts, select those with text data and populate our DataFrame.<\/p>\n<pre class=\"e2-text-code\"><code>vk_posts = client.execute('SELECT * FROM vk_posts')\r\nlist_of_posts = []\r\nlist_of_ids = []\r\nfor post in vk_posts:\r\n    if str(post[-2]).replace(&quot; &quot;, &quot;&quot;):\r\n        list_of_posts.append(str(post[-2]).replace(&quot;\\n&quot;,&quot;&quot;))\r\n        list_of_ids.append(int(post[2]))\r\ndf_posts = pd.DataFrame()\r\ndf_posts['post'] = list_of_posts\r\ndf_posts['id'] = list_of_ids<\/code><\/pre><p>Instantiate our model and iterate through the posts to evaluate the sentiment of each entry.<\/p>\n<pre class=\"e2-text-code\"><code>tokenizer = RegexTokenizer()\r\nmodel = FastTextSocialNetworkModel(tokenizer=tokenizer)\r\nsentiment_list = []\r\nresults = model.predict(list_of_posts, k=2)\r\nfor sentiment in results:\r\n    sentiment_list.append(sentiment)<\/code><\/pre><p>Add several boolean columns to our DataFrame that will reflect whether it’s a  positive, negative, or neutral post.<\/p>\n<pre class=\"e2-text-code\"><code>neutral_list = []\r\nnegative_list = []\r\npositive_list = []\r\nspeech_list = []\r\nskip_list = []\r\nfor sentiment in sentiment_list:\r\n    neutral = sentiment.get('neutral')\r\n    negative = sentiment.get('negative')\r\n    positive = sentiment.get('positive')\r\n    if neutral is None:\r\n        neutral_list.append(0)\r\n    else:\r\n        neutral_list.append(sentiment.get('neutral'))\r\n    if negative is None:\r\n        negative_list.append(0)\r\n    else:\r\n        negative_list.append(sentiment.get('negative'))\r\n    if positive is None:\r\n        positive_list.append(0)\r\n    else:\r\n        positive_list.append(sentiment.get('positive'))\r\ndf_posts['neutral'] = neutral_list\r\ndf_posts['negative'] = negative_list\r\ndf_posts['positive'] = positive_list<\/code><\/pre><p>That’s how the DataFrame looks now:<\/p>\n<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/1-9.png\" width=\"669\" height=\"187\" alt=\"\" \/>\n<\/div>\n<p>Let’s examine the most negative posts:<\/p>\n<pre class=\"e2-text-code\"><code>df_posts[df_posts.negative &gt; 0.9]<\/code><\/pre><div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/2-9.png\" width=\"631\" height=\"289\" alt=\"\" \/>\n<\/div>\n<p>Now, let’s add data about the authors of these posts by merging two tables together on the id column.<\/p>\n<pre class=\"e2-text-code\"><code>vk_users = client.execute('SELECT * FROM vk_users')\r\nvk_user_ids_list = []\r\nvk_user_sex_list = []\r\nvk_user_city_list = []\r\nvk_user_age_list = []\r\nfor user in vk_users:\r\n    vk_user_ids_list.append(user[0])\r\n    vk_user_sex_list.append(user[1])\r\n    vk_user_city_list.append(user[2])\r\n    vk_user_age_list.append(user[3])\r\ndf_users = pd.DataFrame()\r\ndf_users['id'] = vk_user_ids_list\r\ndf_users['sex'] = vk_user_sex_list\r\ndf_users['city'] = vk_user_city_list\r\ndf_users['age'] = vk_user_age_list\r\ndf = df_posts.merge(df_users, on='id')<\/code><\/pre><p>And the table now looks the following:<\/p>\n<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/3-10.png\" width=\"819\" height=\"346\" alt=\"\" \/>\n<\/div>\n<h2>Analysing data with Plotly<\/h2>\n<p class=\"note\">Check out our previous article on data visualization with Plotly: <a href=\"https:\/\/www.valiotti.com\/leftjoin\/all\/building-an-interactive-waterfall-chart-in-python\/\" class=\"nu\">“<u>Building an interactive waterfall chart in Python<\/u>”<\/a><\/p>\n<p>Let’s find the percentage of posts for each group: positive, negative, neutral. Iterate through these three columns and calculate the values more than zero for each data point. Then do the same for different age categories and gender.<\/p>\n<div class=\"embed-responsive embed-responsive-4by3\" style=\"min-width:500\"><iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\"seamless=\"seamless\" src=\"https:\/\/plotly.com\/~i-bond\/9.embed?showlink=false\" height=\"500\" width=\"500\" ><\/iframe>\n<\/div><p>According to our chart, 45% of recent user posts relevant to the keyword “constitution” have a negative meaning, while the other 52% are neutral. Later it’ll be known how different the Internet opinions from the voting results.<\/p>\n<div class=\"embed-responsive embed-responsive-4by3\" style=\"min-width:800\"><iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\"seamless=\"seamless\" src=\"https:\/\/plotly.com\/~i-bond\/5.embed?showlink=false\" height=\"500\" width=\"800\"><\/iframe>\n<\/div><p>It’s noticeable that among the men audience the proportion of positive posts is less than 2%, while for women it’s 3.5%. However, the number of negative posts for each group is almost the same, 47% and 43% respectively.<\/p>\n<div class=\"embed-responsive embed-responsive-4by3\" style=\"min-width:800\"><iframe id=\"igraph\" scrolling=\"no\" style=\"border:none; position:relative;\"seamless=\"seamless\" src=\"https:\/\/plotly.com\/~i-bond\/7.embed?showlink=false\" height=\"500\" width=\"800\"><\/iframe>\n<\/div><p>According to our analysis,  posts made by younger audiences between 18-25 years have more positive sentiment, which is 6%. While users under 18 years leave mostly negative posts, this may be because most users under the age of 18 prefer to hide their real age, this makes it difficult to obtain accurate data for such a group.<br \/>\nThe proportion of negative posts is almost equal for all groups and accounts for 44%.<br \/>\nAs you can see, the data is distributed equally in all three charts. This means that half of all posts relevant to the keyword “constitution” and made by VK users over the past week mostly have a negative sentiment.<\/p>\n",
            "date_published": "2020-07-08T12:53:42+03:00",
            "date_modified": "2020-07-08T12:49:06+03:00",
            "image": "https:\/\/en.leftjoin.ru\/pictures\/1-9.png",
            "_date_published_rfc2822": "Wed, 08 Jul 2020 12:53:42 +0300",
            "_rss_guid_is_permalink": "false",
            "_rss_guid": "35",
            "_e2_data": {
                "is_favourite": false,
                "links_required": [
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css"
                ],
                "og_images": [
                    "https:\/\/en.leftjoin.ru\/pictures\/1-9.png",
                    "https:\/\/en.leftjoin.ru\/pictures\/2-9.png",
                    "https:\/\/en.leftjoin.ru\/pictures\/3-10.png"
                ]
            }
        },
        {
            "id": "34",
            "url": "https:\/\/en.leftjoin.ru\/all\/building-an-interactive-waterfall-chart-in-python\/",
            "title": "Building an interactive waterfall chart in Python",
            "content_html": "<p>Back in 2014, we built a waterfall chart in Excel, widely known in the consulting world, for one of our presentations about the e-commerce market in Ulmart. It’s been a while and today we are going to draw one in Python and <a href=\"https:\/\/plotly.com\">the Plotly library<\/a>. This type of charts is oftentimes used to illustrate changes with the appearance of a new positive or negative factor. In the latter article about data visualization, we explained <a href=\"https:\/\/www.valiotti.com\/leftjoin\/all\/beautiful-bar-charts-with-python-and-matplotlib\/\">how to build a beautiful Bar Chart with bars that resemble thermometers<\/a>, it’s especially useful when we want to compare planned targets with actual values.<\/p>\n<div class=\"e2-text-picture\">\n<img src=\"https:\/\/en.leftjoin.ru\/pictures\/ecommerce-wf@2x.png\" width=\"592.5\" height=\"440.5\" alt=\"\" \/>\n<\/div>\n<p>We are using the Ulmart data on the e-commerce market growth from 2013 to 2014. Data on the X-axis is chart captions,  on the Y-axis we displayed the initial and final values,  as well as their change. With the sum() function calculate the total and add it to the end of our list. The &lt;br&gt tag in the <span class=\"inline-code\">x_list<\/span>  shows a line break in text.<\/p>\n<pre class=\"e2-text-code\"><code>import plotly.graph_objects as go\r\n\r\nx_list = ['2013','The Russian &lt;br&gt;Macroeconomy', 'Decline in working age&lt;br&gt;population','Internet usage growth','Development of&lt;br&gt;cross-border trade', 'National companies', '2014']\r\ny_list = [738.5, 48.7, -7.4, 68.7, 99.7, 48.0]\r\ntotal = round(sum(y_list))\r\ny_list.append(total)<\/code><\/pre><p>Let’s create a list with column values, we called it  <span class=\"inline-code\">text_list<\/span>. The values will be taken from the <span class=\"inline-code\">y_list<\/span>,  but first we need to transform them. Convert all numerical values into strings and if it’s not the first or the last column, add a plus sign for clarity. In case it’s a positive change, the color will be green, otherwise red. Highlight the first and the last values with the &lt;b&gt tag;<\/p>\n<pre class=\"e2-text-code\"><code>text_list = []\r\nfor index, item in enumerate(y_list):\r\n    if item &gt; 0 and index != 0 and index != len(y_list) - 1:\r\n        text_list.append(f'+{str(y_list[index])}')\r\n    else:\r\n        text_list.append(str(y_list[index]))\r\nfor index, item in enumerate(text_list):\r\n    if item[0] == '+' and index != 0 and index != len(text_list) - 1:\r\n        text_list[index] = '&lt;span style=&quot;color:#2ca02c&quot;&gt;' + text_list[index] + '&lt;\/span&gt;'\r\n    elif item[0] == '-' and index != 0 and index != len(text_list) - 1:\r\n        text_list[index] = '&lt;span style=&quot;color:#d62728&quot;&gt;' + text_list[index] + '&lt;\/span&gt;'\r\n    if index == 0 or index == len(text_list) - 1:\r\n        text_list[index] = '&lt;b&gt;' + text_list[index] + '&lt;\/b&gt;'<\/code><\/pre><p>Let’s set parameters for the dashed lines we want to add. Create a list of dictionaries and fill it with light-gray dashed lines, passing the following:<\/p>\n<pre class=\"e2-text-code\"><code>dict_list = []\r\nfor i in range(0, 1200, 200):\r\n    dict_list.append(dict(\r\n            type=&quot;line&quot;,\r\n            line=dict(\r\n                 color=&quot;#666666&quot;,\r\n                 dash=&quot;dot&quot;\r\n            ),\r\n            x0=-0.5,\r\n            y0=i,\r\n            x1=6,\r\n            y1=i,\r\n            line_width=1,\r\n            layer=&quot;below&quot;))<\/code><\/pre><p>Now, create a graph object with the <span class=\"inline-code\">Waterfall()<\/span> method. Each column in our table can be of a certain type: <span class=\"inline-code\">total<\/span>, <span class=\"inline-code\">absolute<\/span> (both with final values) or <span class=\"inline-code\">relative<\/span> (holds intermediate values). Then we need to set colors, make the connecting line transparent, positive changes will be green, while negative ones are red, and the final columns are purple. Here we are using the Open Sans font.<\/p>\n<p class=\"note\">Learn more about how to choose the right fonts for your data visualization from this article:<a href=\"https:\/\/medium.com\/nightingale\/choosing-a-font-for-your-data-visualization-2ed37afea637\/\"> “Choosing Fonts for Your Data Visualization”<\/a><\/p>\n<pre class=\"e2-text-code\"><code>fig = go.Figure(go.Waterfall(\r\n    name = &quot;e-commerce&quot;, orientation = &quot;v&quot;,\r\n    measure = [&quot;absolute&quot;, &quot;relative&quot;, &quot;relative&quot;, &quot;relative&quot;, &quot;relative&quot;, &quot;relative&quot;, &quot;total&quot;],\r\n    x = x_list,\r\n    y = y_list,\r\n    text = text_list,\r\n    textposition = &quot;outside&quot;,\r\n    connector = {&quot;line&quot;:{&quot;color&quot;:'rgba(0,0,0,0)'}},\r\n    increasing = {&quot;marker&quot;:{&quot;color&quot;:&quot;#2ca02c&quot;}},\r\n    decreasing = {&quot;marker&quot;:{&quot;color&quot;:&quot;#d62728&quot;}},\r\n    totals={'marker':{&quot;color&quot;:&quot;#9467bd&quot;}},\r\n    textfont={&quot;family&quot;:&quot;Open Sans, light&quot;,\r\n              &quot;color&quot;:&quot;black&quot;\r\n             }\r\n))<\/code><\/pre><p>Finally, add the title with the description, hide the legend, set the Y label and add dashed lines to our chart.<\/p>\n<pre class=\"e2-text-code\"><code>fig.update_layout(\r\n    title = \r\n        {'text':'&lt;b&gt;Waterfall chart&lt;\/b&gt;&lt;br&gt;&lt;span style=&quot;color:#666666&quot;&gt;E-commerce market growth from 2013 to 2014&lt;\/span&gt;'},\r\n    showlegend = False,\r\n    height=650,\r\n    font={\r\n        'family':'Open Sans, light',\r\n        'color':'black',\r\n        'size':14\r\n    },\r\n    plot_bgcolor='rgba(0,0,0,0)',\r\n    yaxis_title=&quot;млрд руб.&quot;,\r\n    shapes=dict_list\r\n)\r\nfig.update_xaxes(tickangle=-45, tickfont=dict(family='Open Sans, light', color='black', size=14))\r\nfig.update_yaxes(tickangle=0, tickfont=dict(family='Open Sans, light', color='black', size=14))\r\n\r\nfig.show()<\/code><\/pre><p>And here it is:<\/p>\n<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\" seamless=\"seamless\" src=\"https:\/\/chart-studio.plotly.com\/~i-bond\/1.embed?showlink=false\" height=\"650\" width=\"100%\"><\/iframe>\n",
            "date_published": "2020-06-24T15:53:26+03:00",
            "date_modified": "2020-06-26T06:40:33+03:00",
            "image": "https:\/\/en.leftjoin.ru\/pictures\/ecommerce-wf@2x.png",
            "_date_published_rfc2822": "Wed, 24 Jun 2020 15:53:26 +0300",
            "_rss_guid_is_permalink": "false",
            "_rss_guid": "34",
            "_e2_data": {
                "is_favourite": false,
                "links_required": [
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css",
                    "system\/library\/highlight\/highlight.js",
                    "system\/library\/highlight\/highlight.css"
                ],
                "og_images": [
                    "https:\/\/en.leftjoin.ru\/pictures\/ecommerce-wf@2x.png"
                ]
            }
        }
    ],
    "_e2_version": 3386,
    "_e2_ua_string": "E2 (v3386; Aegea)"
}