How to build a dashboard with Bootstrap 4 from scratch (Part 2)

⏱ Время чтения текста – 12 минут

Previously we shared how to use Bootstrap components in building dashboard layout and designed a simple yet flexible dashboard with a scatter plot and Russian map. In today’s material, we will continue adding more information, explore how to make Bootstrap tables responsive, and cover some complex callbacks for data acquisition.

Constructing Data Tables

All the code for populating our tables with data will be stored in get_tables.py , while the layout components areoutlined in application.py. This article will cover the process of creating the table with top Russian Breweries, however, you can find the code for creating the other three on Github.

Data in the Top Breweries table can be filtered by city name in the dropdown menu, but the data collected in Untappd is not equally structured. Some city names are written in Latin, others in Cyrillic. So the challenge is to make the names equal for SQL queries, and here is where Google Translate comes to the rescue. Though we sill have to manually create a dictionary of city names, since for example “Москва” can be written as “Moskva” and not “Moscow”. This dictionary will be used later for mapping our DataFrame before transforming it into a Bootstrap table.

import pandas as pd
import dash_bootstrap_components as dbc
from clickhouse_driver import Client
import numpy as np
from googletrans import Translator

translator = Translator()

client = Client(host='12.34.56.78', user='default', password='', port='9000', database='')

city_names = {
   'Moskva': 'Москва',
   'Moscow': 'Москва',
   'СПБ': 'Санкт-Петербург',
   'Saint Petersburg': 'Санкт-Петербург',
   'St Petersburg': 'Санкт-Петербург',
   'Nizhnij Novgorod': 'Нижний Новгород',
   'Tula': 'Тула',
   'Nizhniy Novgorod': 'Нижний Новгород',
}

Top Breweries Table

This table displays top 10 Russian breweries and their position change according to the rating. Simply put, we need to compare data for two periods, that’s [30 days ago; today] and [60 days ago; 30 days ago]. With this in mind, we will need the following headers: ranking, brewery name, position change, and number of check-ins.
Create the get_top_russian_breweries function that would make queries to the Clickhouse DB, sort the data and return a refined Pandas DataFrame. Let’s send the following queries to obtain data for the past 30 and 60 days, ordering the results by the number of check-ins.

Querying data from the Database

def get_top_russian_breweries(checkins_n=250):
   top_n_brewery_today = client.execute(f'''
      SELECT  rt.brewery_id,
              rt.brewery_name,
              beer_pure_average_mult_count/count_for_that_brewery as avg_rating,
              count_for_that_brewery as checkins FROM (
      SELECT           
              brewery_id,
              dictGet('breweries', 'brewery_name', toUInt64(brewery_id)) as brewery_name,
              sum(rating_score) AS beer_pure_average_mult_count,
              count(rating_score) AS count_for_that_brewery
          FROM beer_reviews t1
          ANY LEFT JOIN venues AS t2 ON t1.venue_id = t2.venue_id
          WHERE isNotNull(venue_id) AND (created_at >= (today() - 30)) AND (venue_country = 'Россия') 
          GROUP BY           
              brewery_id,
              brewery_name) rt
      WHERE (checkins>={checkins_n})
      ORDER BY avg_rating DESC
      LIMIT 10
      '''
   )

top_n_brewery_n_days = client.execute(f'''
  SELECT  rt.brewery_id,
          rt.brewery_name,
          beer_pure_average_mult_count/count_for_that_brewery as avg_rating,
          count_for_that_brewery as checkins FROM (
  SELECT           
          brewery_id,
          dictGet('breweries', 'brewery_name', toUInt64(brewery_id)) as brewery_name,
          sum(rating_score) AS beer_pure_average_mult_count,
          count(rating_score) AS count_for_that_brewery
      FROM beer_reviews t1
      ANY LEFT JOIN venues AS t2 ON t1.venue_id = t2.venue_id
      WHERE isNotNull(venue_id) AND (created_at >= (today() - 60) AND created_at <= (today() - 30)) AND (venue_country = 'Россия')
      GROUP BY           
          brewery_id,
          brewery_name) rt
  WHERE (checkins>={checkins_n})
  ORDER BY avg_rating DESC
  LIMIT 10
  '''
)

Creating two DataFrames with the received data:

top_n = len(top_n_brewery_today)
column_names = ['brewery_id', 'brewery_name', 'avg_rating', 'checkins']

top_n_brewery_today_df = pd.DataFrame(top_n_brewery_today, columns=column_names).replace(np.nan, 0)
top_n_brewery_today_df['brewery_pure_average'] = round(top_n_brewery_today_df.avg_rating, 2)
top_n_brewery_today_df['brewery_rank'] = list(range(1, top_n + 1))

top_n_brewery_n_days = pd.DataFrame(top_n_brewery_n_days, columns=column_names).replace(np.nan, 0)
top_n_brewery_n_days['brewery_pure_average'] = round(top_n_brewery_n_days.avg_rating, 2)
top_n_brewery_n_days['brewery_rank'] = list(range(1, len(top_n_brewery_n_days) + 1))

And then calculate the position change over the period of time for each brewery received. With the try-except block, we will handle exceptions, in case, if a brewery was not yet in our database 60 days ago.

rank_was_list = []
for brewery_id in top_n_brewery_today_df.brewery_id:
   try:
       rank_was_list.append(
           top_n_brewery_n_days[top_n_brewery_n_days.brewery_id == brewery_id].brewery_rank.item())
   except ValueError:
       rank_was_list.append('–')
top_n_brewery_today_df['rank_was'] = rank_was_list

Now we iterate over the columns with current and former positions. If there is no hyphen contained in, we will append an up or down arrow depending on the change.

diff_rank_list = []
for rank_was, rank_now in zip(top_n_brewery_today_df['rank_was'], top_n_brewery_today_df['brewery_rank']):
   if rank_was != '–':
       difference = rank_was - rank_now
       if difference > 0:
           diff_rank_list.append(f'↑ +{difference}')
       elif difference < 0:
           diff_rank_list.append(f'↓ {difference}')
       else:
           diff_rank_list.append('–')
   else:
       diff_rank_list.append(rank_was)

Finally, replace DataFrame headers, inserting the column with current ranking positions, where the top 3 will be displayed with the trophy emoji.

df = top_n_brewery_today_df[['brewery_name', 'avg_rating', 'checkins']].round(2)
df.insert(2, 'Position change', diff_rank_list)
df.columns = ['NAME', 'RATING', 'POSITION CHANGE', 'CHECK-INS']
df.insert(0, 'RANKING', list('🏆 ' + str(i) if i in [1, 2, 3] else str(i) for i in range(1, len(df) + 1)))

return df

Filtering data by city name

One of the main tasks we set before creating this dashboard was to find out what are the most liked breweries in a certain city. The user chooses a city in the dropdown menu and gets the results. Sound pretty simple, but is it that easy?
Our next step is to write a script that would update data for each city and store it in separate CSV files. As we mentioned earlier, the city names are not equally structured, so we need to use Google Translator within the if-else block, and since it may not convert some names to Cyrillic we need to explicitly specify such cases:

en_city = venue_city
if en_city == 'Nizhnij Novgorod':
      ru_city = 'Нижний Новгород'
elif en_city == 'Perm':
      ru_city = 'Пермь'
elif en_city == 'Sergiev Posad':
      ru_city = 'Сергиев Посад'
elif en_city == 'Vladimir':
      ru_city = 'Владимир'
elif en_city == 'Yaroslavl':
      ru_city = 'Ярославль'
else:
      ru_city = translator.translate(en_city, dest='ru').text

Then we need to add both city names in English and Russian to the SQL query, to receive all check-ins sent from this city.

WHERE (rt.venue_city='{ru_city}' OR rt.venue_city='{en_city}')

Finally, we export received data into a CSV file in the following directory – data/cities.

df = top_n_brewery_today_df[['brewery_name', 'venue_city', 'avg_rating', 'checkins']].round(2)
df.insert(3, 'Position Change', diff_rank_list)
df.columns = ['NAME', 'CITY', 'RATING', 'POSITION CHANGE', 'CHECK-INS']
# MAPPING
df['CITY'] = df['CITY'].map(lambda x: city_names[x] if (x in city_names) else x)
# TRANSLATING
df['CITY'] = df['CITY'].map(lambda x: translator.translate(x, dest='en').text)
df.to_csv(f'data/cities/{en_city}.csv', index=False)
print(f'{en_city}.csv updated!')

Scheduling Updates

We will use the apscheduler library to automatically run the script and refresh data for each city in all_cities every day at 10:30 am (UTC).

from apscheduler.schedulers.background import BackgroundScheduler
from get_tables import update_best_breweries

all_cities = sorted(['Vladimir', 'Voronezh', 'Ekaterinburg', 'Kazan', 'Red Pakhra', 'Krasnodar',
             'Kursk', 'Moscow', 'Nizhnij Novgorod', 'Perm', 'Rostov-on-Don', 'Saint Petersburg',
             'Sergiev Posad', 'Tula', 'Yaroslavl'])

scheduler = BackgroundScheduler()
@scheduler.scheduled_job('cron', hour=10, misfire_grace_time=30)
def update_data():
   for city in all_cities:
       update_best_breweries(city)
scheduler.start()

Table from DataFrame

get_top_russian_breweries_table(venue_city, checkins_n=250) will accept venue_city and checkins_n generating a Bootstrap Table with the top breweries. The second parameter value, checkins_n can be changed with the slider. If the city name is not specified, the function will return top Russian breweries table.

if venue_city == None: 
      selected_df = get_top_russian_breweries(checkins_n)
else: 
      en_city = venue_city

In other case the DataFrame will be constructed from a CSV file stored in data/cities/. Since the city column still may contain different names we should apply mapping and use a lambda expression with the map() method. The lambda function will compare values in the column against keys in city_names and if there is a match, the column value will be overwritten.
For instance, if df[‘CITY’] contains “СПБ”, a frequent acronym for Saint Petersburg, the value will be replaced, while for “Воронеж” it will remain unchanged.
And last but not least, we need to remove all duplicate rows from the table, add a column with a ranking position and return the first 10 rows. These would be the most liked breweries in a selected city.

df = pd.read_csv(f'data/cities/{en_city}.csv')     
df = df.loc[df['CHECK-INS'] >= checkins_n]
df.drop_duplicates(subset=['NAME', 'CITY'], keep='first', inplace=True)  
df.insert(0, 'RANKING', list('🏆 ' + str(i) if i in [1, 2, 3] else str(i) for i in range(1, len(df) + 1)))
selected_df = df.head(10)

After all DataFrame manipulations, the function returns a simply styled Bootstrap table of top breweries.

Bootstrap table layout in DBC

table = dbc.Table.from_dataframe(selected_df, striped=False,
                                bordered=False, hover=True,
                                size='sm',
                                style={'background-color': '#ffffff',
                                       'font-family': 'Proxima Nova Regular',
                                       'text-align':'center',
                                       'fontSize': '12px'},
                                className='table borderless'
                                )

return table

Layout structure

Add a Slider and a Dropdown menu with city names in application.py

To learn more about the Dashboard layout structure, please refer to our previous guide

checkins_slider_tab_1 = dbc.CardBody(
                           dbc.FormGroup(
                               [
                                   html.H6('Number of check-ins', style={'text-align': 'center'})),
                                   dcc.Slider(
                                       id='checkin_n_tab_1',
                                       min=0,
                                       max=250,
                                       step=25,
                                       value=250,  
                                       loading_state={'is_loading': True},
                                       marks={i: i for i in list(range(0, 251, 25))}
                                   ),
                               ],
                           ),
                           style={'max-height': '80px', 
                                  'padding-top': '25px'
                                  }
                       )

top_breweries = dbc.Card(
       [
           dbc.CardBody(
               [
                   dbc.FormGroup(
                       [
                           html.H6('Filter by city', style={'text-align': 'center'}),
                           dcc.Dropdown(
                               id='city_menu',
                               options=[{'label': i, 'value': i} for i in all_cities],
                               multi=False,
                               placeholder='Select city',
                               style={'font-family': 'Proxima Nova Regular'}
                           ),
                       ],
                   ),
                   html.P(id="tab-1-content", className="card-text"),
               ],
           ),
   ],
)

We’ll also need to add a callback function to update the table by dropdown menu and slider values:

@app.callback(
   Output("tab-1-content", "children"), [Input("city_menu", "value"),
                                         Input("checkin_n_tab_1", "value")]
)
def table_content(city, checkin_n):
   return get_top_russian_breweries_table(city, checkin_n)

Tada, the main table is ready! The dashboard can be used to receive up-to-date info about best Russian breweries, beers, and its rating across different regions, and help to make a better choice for an enjoyable tasting experience.

View the code on GitHub

Your password

LEFT JOIN: blog on analytics, visualisation & data science