<?xml version="1.0" encoding="utf-8"?> 
<rss version="2.0">

<channel>

<title>LEFT JOIN: blog on analytics, visualisation &amp; data science, posts tagged: Amazon Web Services</title>
<link>https://en.leftjoin.ru/tags/amazon-web-services/</link>
<description></description>
<generator>E2 (v3386; Aegea)</generator>

<item>
<title>Deploying Analytical Web App with AWS Elastic Beanstalk</title>
<guid isPermaLink="false">42</guid>
<link>https://en.leftjoin.ru/all/deploying-analytical-web-app-with-aws-elastic-beanstalk/</link>
<comments>https://en.leftjoin.ru/all/deploying-analytical-web-app-with-aws-elastic-beanstalk/</comments>
<description>
&lt;p&gt;If you need to deploy a web application and there’s an AWS EC2 Instance at hand, why not use Elastic Beanstalk? This is an AWS service that allows us to orchestrate many other ones, including EC2, S3, Simple Notification Service, CloudWatch, etc.&lt;/p&gt;
&lt;h2&gt;Setting things up&lt;/h2&gt;
&lt;p&gt;Previously, in our article &lt;a href="https://www.valiotti.com/leftjoin/all/building-a-plotly-dashboard-with-dynamic-sliders-in-python/"&gt;“Building a Plotly Dashboard with dynamic sliders in Python” &lt;/a&gt; we created a project with two scripts: application.py – creates a dashboard on a local server, and get_plots.py – returns a scatter plot with Untappd breweries from &lt;a href="https://www.valiotti.com/leftjoin/all/building-a-scatter-plot-for-untappd-breweries/" class="nu"&gt;“&lt;u&gt;Building a scatter plot for Untappd Breweries&lt;/u&gt;”&lt;/a&gt;. Let’s modify the application.py script a bit to make it run with Elastic Beanstalk. Assign app.server to the application variable,  it should look something like this:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;application = app.server

if __name__ == '__main__':
   application.run(debug=True, port=8080)&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Before deploying our app we need to create a compressed archive. This archive should contain all the necessary files, including  requirements.txt that specifies what python packages are required to run the project. Just type pip freeze in your terminal window and save the output to a file:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;pip freeze &amp;gt; requirements.txt&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now we can create a compressed archive. Unix-based systems have a built-in zip command for archiving and compression:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;zip deploy_v0 application.py get_plots.py requirements.txt&lt;/code&gt;&lt;/pre&gt;&lt;h2&gt;Application and Environment&lt;/h2&gt;
&lt;p&gt;Navigate to  &lt;a href="https://us-east-2.console.aws.amazon.com/elasticbeanstalk/home?region=us-east-2"&gt;Elastic Beanstalk&lt;/a&gt;, click the “Applications” section and then “Create a new application”.&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/1-14.png" width="915" height="170" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Fill in the necessary fields by specifying your app name and its description. After this, we are suggested to assign metadata and tag our app. The format of the tag is similar to a dictionary in Python, it’s a key-value pair, where the value of a key is unique. Once you’re ready to continue click the orange “Create’” button.&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/2-14.png" width="907" height="358" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;After this step, you will see a list of environments available for your app, which is initially empty. Click “ Create a new environment”&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/3-13.png" width="912" height="326" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Since we are working with a web app, we need to select a web server environment:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/4-11.png" width="906" height="413" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;On the next step we need to specify our environment name and also choose a domain name, if available:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/5-9.png" width="897" height="546" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Next, we select the platform for our app, which is written in Python:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/6-7.png" width="898" height="413" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Now we can upload the file with our app, click  “ Upload your code” and attach the compressed file. Afterward, click “Create environment”.&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/7-6.png" width="893" height="834" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;You will see a terminal window with event logs. We have a couple of minutes for a coffee break.&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/8-5.png" width="912" height="326" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Now our app is up and running, if you need to upload a new version, just create a new archive with updated files and click the” Upload and deploy” button again. If everything’s done right, you will see something like this:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/9-5.png" width="914" height="388" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;We can switch to the site with our dashboard by following the link above. Using the  &amp;lt;iframe&amp;gt; tag our dashboard can be embedded into any other site.&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;&amp;lt;iframe id=&amp;quot;igraph&amp;quot; scrolling=&amp;quot;no&amp;quot; style=&amp;quot;border:none;&amp;quot; seamless=&amp;quot;seamless&amp;quot; src=&amp;quot;http://dashboard1-env.eba-fvfdgmks.us-east-2.elasticbeanstalk.com/&amp;quot; height=&amp;quot;1100&amp;quot; width=&amp;quot;800&amp;quot;&amp;gt;&amp;lt;/iframe&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;As a result, you can see the following dashboard:&lt;/p&gt;
&lt;div class="embed-responsive embed-responsive-4by3" style="min-width:500"&gt;&lt;iframe id="igraph" scrolling="no" style="border:none;" seamless="seamless" src="http://webappsdashboard-env.eba-z4jtzsq2.sa-east-1.elasticbeanstalk.com/" height="1100" width="800"&gt;&lt;/iframe&gt;
&lt;/div&gt;&lt;p&gt;&lt;a href="https://github.com/valiotti/leftjoin/tree/master/dashboard_deployment"&gt;View the code on Github&lt;/a&gt;&lt;/p&gt;
</description>
<pubDate>Thu, 06 Aug 2020 09:44:07 +0300</pubDate>
</item>

<item>
<title>Handling website buttons in Selenium</title>
<guid isPermaLink="false">33</guid>
<link>https://en.leftjoin.ru/all/handling-website-buttons-in-selenium/</link>
<comments>https://en.leftjoin.ru/all/handling-website-buttons-in-selenium/</comments>
<description>
&lt;p&gt;In our previous article, &lt;a href="https://www.valiotti.com/leftjoin/all/parse-website-with-python-p2/" class="nu"&gt;“&lt;u&gt;Parsing the data of site’s catalogue, using Beautiful Soup and Selenium&lt;/u&gt;”&lt;/a&gt; we have addressed the problem of working with dynamic pages, but sometimes this method doesn’t work,  as with “Show more” buttons. Today we will show how you can imitate button click with Selenium to load a whole page, collect beer IDs, ratings, and send the data to Clickhouse.&lt;/p&gt;
&lt;h2&gt;Webpage structure&lt;/h2&gt;
&lt;p&gt;Let’s take a random brewery that has 105 check-ins, or customer feedbacks. One page with check-ins displays up to 25 records and looks like this:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/1-8.png" width="1186" height="735" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;If we try to scroll down to the bottom, we will encounter the same button that prevents us from getting all 105 records at once:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/2-8.png" width="350" height="65" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;First off, to address this task, let’s find out the button class and just click it until it works. Since Selenium launches the browser and the next “Show more”  button may not be loaded in time, that’s why we set  2-second intervals between the clicks. As soon as the page is loaded we will take its content and parse the relevant data.&lt;br /&gt;
Let’s view the source code and  find the button, it’s assigned to the &lt;span class="inline-code"&gt;more_checkins&lt;/span&gt; class.&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/3-9.png" width="849" height="94" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;The button has style attributes, such as &lt;span class="inline-code"&gt;display&lt;/span&gt;. When the button is displayed this attribute takes the &lt;span class="inline-code"&gt;block&lt;/span&gt; value. But when we scroll the page to the buttom and there is nothing left to display, the attribute takes the &lt;span class="inline-code"&gt;none&lt;/span&gt; value and we can stop clicking.&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/4-8.png" width="562" height="107" alt="" /&gt;
&lt;/div&gt;
&lt;h2&gt;Writing our code&lt;/h2&gt;
&lt;p&gt;Let’s import the necessary libraries&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;import time
from selenium import webdriver
from bs4 import BeautifulSoup as bs
import re
from datetime import datetime
from clickhouse_driver import Client&lt;/code&gt;&lt;/pre&gt;&lt;p class="note"&gt;Chromedriver is used to run Selenium tests on Chrome and can be downloaded from &lt;a href="https://chromedriver.chromium.org/downloads"&gt;the official website&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Connect to the database and create cookies:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;client = Client(host='ec1-23-456-789-10.us-east-2.compute.amazonaws.com', user='', password='', port='9000', database='')
count = 0
cookies = {
    'domain':'untappd.com',
    'expiry':1594072726,
    'httpOnly':True,
    'name':'untappd_user_v3_e',
    'path':'/',
    'secure':False,
    'value':'your_value'
}&lt;/code&gt;&lt;/pre&gt;&lt;p class="note"&gt;You can find out more about working with cookies in Selenium from &lt;a href="https://www.valiotti.com/leftjoin/all/parse-website-with-python-p2/" class="nu"&gt;“&lt;u&gt;Parsing the data of site’s catalogue, using Beautiful Soup and Selenium&lt;/u&gt;”&lt;/a&gt;. We will need the untappd_user_v3_e parameter.&lt;/p&gt;
&lt;p&gt;As we are going to work with pages that have more than hundreds of thousands of records,  it’s pretty heavy and our instance may be overloaded. To prevent this, we will shut down unnecessary parts and then enable authentication cookie:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;options = webdriver.ChromeOptions()
prefs = {'profile.default_content_setting_values': {'images': 2, 
                            'plugins': 2, 'fullscreen': 2}}
options.add_experimental_option('prefs', prefs)
options.add_argument(&amp;quot;start-maximized&amp;quot;)
options.add_argument(&amp;quot;disable-infobars&amp;quot;)
options.add_argument(&amp;quot;--disable-extensions&amp;quot;)
driver = webdriver.Chrome(options=options)
driver.get('https://untappd.com/TooSunnyBrewery')
driver.add_cookie(cookies)&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;We  will need a function that would take a link, open it in the browser, load a whole page and return a soup object to be parsed. Get the  &lt;span class="inline-code"&gt;display&lt;/span&gt; attribute, assign it to the &lt;span class="inline-code"&gt;more_checkins&lt;/span&gt;: variable and click the button until the attribute is &lt;span class="inline-code"&gt;none&lt;/span&gt;. Let’s set  2-second intervals between the clicks, to wait for the page to load. As soon as we received the page, converth it into a &lt;span class="inline-code"&gt;soup&lt;/span&gt; object using the &lt;span class="inline-code"&gt;bs4&lt;/span&gt; library.&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;def get_html_page(url):
    driver.get(url)
    driver.maximize_window()
    more_checkins = driver.execute_script(&amp;quot;var more_checkins=document.getElementsByClassName('more_checkins_logged')[0].style.display;return more_checkins;&amp;quot;)
    print(more_checkins)
    while more_checkins != &amp;quot;none&amp;quot;:
        driver.execute_script(&amp;quot;document.getElementsByClassName('more_checkins_logged')[0].click()&amp;quot;)
        time.sleep(2)
        more_checkins = driver.execute_script(&amp;quot;var more_checkins=document.getElementsByClassName('more_checkins_logged')[0].style.display;return more_checkins;&amp;quot;)
        print(more_checkins)
    source_data = driver.page_source
    soup = bs(source_data, 'lxml')
    return soup&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Write the following function that will take a page &lt;span class="inline-code"&gt;url&lt;/span&gt;, pass it in the &lt;span class="inline-code"&gt;get_html_page&lt;/span&gt; and receive a soup object to parse. The function returns zipped lists with beer IDs and ratings.&lt;/p&gt;
&lt;p class="note"&gt; See how you can use &lt;a href="https://www.valiotti.com/leftjoin/all/parse-website-with-python-p1/"&gt;Beautiful Soup to retrieve data from a website catalogue&lt;/a&gt;&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;def parse_html_page(url):
    soup = get_html_page(url)
    brewery_id = soup.find_all('a', {'class':'label',
                                     'href':re.compile('https://untappd.com/brewery/*')})[0]['href'][28:]
    items = soup.find_all('div', {'class':'item',
                                  'id':re.compile('checkin_*')})
    checkin_rating_list = []
    beer_id_list = []
    count = 0
    print('Filling the lists')
    for checkin in items:
        print(count, '/', len(items))
        try:
            checkin_rating_list.append(float(checkin.find('div', {'class':'caps'})['data-rating']))
        except Exception:
            checkin_rating_list.append('cast(Null as Nullable(Float32))')
        try:
            beer_id_list.append(int(checkin.find('a', {'class':'label'})['href'][-7:]))
        except Exception:
            beer_id_list.append('cast(Null as Nullable(UInt64))')
        count += 1 
    return zip(checkin_rating_list, beer_id_list)&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Finally, write a function call for the breweries. We’ve covered how to receive a list of Russian brewery IDs in this article: &lt;a href="https://www.valiotti.com/leftjoin/all/example-of-using-dictionaries-in-clickhouse-with-untappd/"&gt;Example of using dictionaries in Clickhouse with Untappd&lt;/a&gt;.&lt;br /&gt;
Let’s fetch it from the Clickhouse table.&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;brewery_list = client.execute('SELECT brewery_id FROM brewery_info')&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If we print out the &lt;span class="inline-code"&gt;brewery_list&lt;/span&gt;,  we will find out that the data is stored in a list of tuples.&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/5-6.png" width="378" height="141" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Let’s make it a bit prettier with the lambda expression:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;flatten = lambda lst: [item for sublist in lst for item in sublist]
brewery_list = flatten(brewery_list)&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;That’s much better:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/6-5.png" width="252" height="139" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Create a &lt;span class="inline-code"&gt;url&lt;/span&gt; for each brewery in the list, it includes a standard link and a brewery ID in the end. Pass it to the &lt;span class="inline-code"&gt;parse_html_page&lt;/span&gt; function that fetches the &lt;span class="inline-code"&gt;get_html_page&lt;/span&gt; and return lists with &lt;span class="inline-code"&gt;beer_id&lt;/span&gt; and &lt;span class="inline-code"&gt;rating_score&lt;/span&gt;. Since the lists are zipped, we can iterate throught them, create a tuple and send it to Clickhouse.&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;for brewery_id in brewery_list:
    print('Fetching the brewery with id', brewery_id, count, '/', len(brewery_list))
    url = 'https://untappd.com/brewery/' + str(brewery_id)
    returned_checkins = parse_html_page(url)
    for rating, beer_id in returned_checkins:
        tuple_to_insert = (rating, beer_id)
        try:
            client.execute(f'INSERT INTO beer_reviews VALUES {tuple_to_insert}')
        except errors.ServerException as E:
            print(E)
    count += 1&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;That’s it about the way we can handle “Show more” buttons. Over time we will form a large dataset for further analysis, to work with in our next series.&lt;/p&gt;
</description>
<pubDate>Mon, 22 Jun 2020 10:54:56 +0300</pubDate>
</item>

<item>
<title>Example of using dictionaries in Clickhouse with Untappd</title>
<guid isPermaLink="false">32</guid>
<link>https://en.leftjoin.ru/all/example-of-using-dictionaries-in-clickhouse-with-untappd/</link>
<comments>https://en.leftjoin.ru/all/example-of-using-dictionaries-in-clickhouse-with-untappd/</comments>
<description>
&lt;p&gt;In Clickhouse we can use internal dictionaries as well as external dictionaries, they can be an alternative to JSON that doesn’t always work fine. DIctionaries store information in memory and can be invoked with the dictGet method. Let’s review how we can create one in Clickhouse and use it for our queries.&lt;/p&gt;
&lt;p&gt;We will illustrate an example of data using the Untappd API. Untappd is a social network for everyone who loves craft beer. We are going to use сheck-ins of Russian-based craft breweries and start collecting information about them to analyze this data later on and to draw some conclusions. in today’s article, we will analyze how to receive metadata on Russian breweries with Untappd and store it in a Clickhouse dictionary.&lt;/p&gt;
&lt;h2&gt;Collecting data with Untappd&lt;/h2&gt;
&lt;p&gt;First off, we need to create a new app to receive &lt;span class="inline-code"&gt;client_id&lt;/span&gt; and  &lt;span class="inline-code"&gt;client_secret_key&lt;/span&gt; to make API calls. Follow  &lt;a href="https://untappd.com/api/register?register=new"&gt;this link&lt;/a&gt; and fill in the fields:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/1-7.png" width="734" height="478" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Usually, it takes about 1 to 3 weeks to wait for approval.&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;import requests
import pandas as pd
import time&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;We’ll be using the requests library to make API calls, view results in a Pandas DataFrame, and save them in a CSV file before sending it to a Clickhouse dictionary. Untappd has strict limits on the number of requests, prohibiting us to make more than 100 calls per hour. Therefore, we need to make our script wait for 38 seconds using the Python time module.&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;client_id = 'your_client_id'
client_secret = 'your_client_secret'
all_brewery_of_russia = []&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;We want to get data for one thousand Russian breweries. One request to the &lt;a href="https://untappd.com/api/docs#brewerysearch"&gt;Brewery Search&lt;/a&gt; method enables us to view up to 50 breweries. The website gave us 3369 breweries when searching the word “Russia” manually.&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/2-7.png" width="728" height="223" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Let’s check this, scroll down to the bottom, and open the page code.&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/3-8.png" width="405" height="199" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Each brewery received is stored in the &lt;span class="inline-code"&gt;beer-item&lt;/span&gt; class. This means we can the number of references to &lt;span class="inline-code"&gt;beer-item&lt;/span&gt;:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/4-7.png" width="395" height="21" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;And as it turned out, we have exactly 1000 breweries, not 3369. When searching the word “Russia” manually, the results also contain some American breweries. So, we need to make 20 calls, getting 50 breweries at a time:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;for offset in range(0, 1000, 50):
    try:
        print('offset = ', offset)
        print('remained:', 1000 - offset, '\n')
        response = requests.get(f'https://api.untappd.com/v4/search/brewery?client_id={client_id}&amp;amp;client_secret={client_secret}',
                               params={
                                   'q':'Russia',
                                   'offset':offset,
                                   'limit':50
                               })
        item = response.json()
        print(item, '\n')
        all_brewery_of_russia.append(item)
        time.sleep(37)
    except Exception:
        print(Exception)
        continue&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;span class="inline-code"&gt;Brewery Search&lt;/span&gt; method includes several parameters, q – a string with a country name (specify specify “Russia” to get all the breweries based in Russia),  offset – allows us to shift by 50 lines in the search to get the next list of breweries, limit – restricts the number of breweries received and can not be more than 50. Convert the answer to JSON and append data sotred in the &lt;span class="inline-code"&gt;item&lt;/span&gt; object to the  &lt;span class="inline-code"&gt;all_brewery_of_russia&lt;/span&gt;  list.&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/5-5.png" width="911" height="323" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Our data may also include breweries from other countries. That’s why we need to filter the data. Iterate through the &lt;span class="inline-code"&gt;all_brewery_of_russia&lt;/span&gt;  list and keep only those breweires, which &lt;span class="inline-code"&gt;country_name&lt;/span&gt; is Russia.&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;brew_list = []
for element in all_brewery_of_russia:
    brew = element['response']['brewery']
    for i in range(brew['count']):
        if brew['items'][i]['brewery']['country_name'] == 'Russia':
            brew_list.append(brew['items'][i])&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Print out the first element in our &lt;span class="inline-code"&gt;brew_list&lt;/span&gt;:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;print(brew_list[0])&lt;/code&gt;&lt;/pre&gt;&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/11.png" width="1009" height="79" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Create a DataFrame with the following columns: &lt;span class="inline-code"&gt;brewery_id&lt;/span&gt;,  &lt;span class="inline-code"&gt;beer_count&lt;/span&gt;,  &lt;span class="inline-code"&gt;brewery_name&lt;/span&gt;,  &lt;span class="inline-code"&gt;brewery_slug&lt;/span&gt;,  &lt;span class="inline-code"&gt;brewery_page_url&lt;/span&gt;,  &lt;span class="inline-code"&gt;brewery_city&lt;/span&gt;,  &lt;span class="inline-code"&gt;lat&lt;/span&gt; и  &lt;span class="inline-code"&gt;lng&lt;/span&gt;. And several lists to sort out the data stored in the &lt;span class="inline-code"&gt;brewery_list&lt;/span&gt;:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;df = pd.DataFrame()
brewery_id_list = []
beer_count_list = []
brewery_name_list = []
brewery_slug_list = []
brewery_page_url_list = []
brewery_location_city = []
brewery_location_lat = []
brewery_location_lng = []
for brewery in brew_list:
    brewery_id_list.append(brewery['brewery']['brewery_id'])
    beer_count_list.append(brewery['brewery']['beer_count'])
    brewery_name_list.append(brewery['brewery']['brewery_name'])
    brewery_slug_list.append(brewery['brewery']['brewery_slug'])
    brewery_page_url_list.append(brewery['brewery']['brewery_page_url'])
 brewery_location_city.append(brewery['brewery']['location']['brewery_city'])
    brewery_location_lat.append(brewery['brewery']['location']['lat'])
    brewery_location_lng.append(brewery['brewery']['location']['lng'])&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Assign them as column values:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;df['brewery_id'] = brewery_id_list
df['beer_count'] = beer_count_list
df['brewery_name'] = brewery_name_list
df['brewery_slug'] = brewery_slug_list
df['brewery_page_url'] = brewery_page_url_list
df['brewery_city'] = brewery_location_city
df['brewery_lat'] = brewery_location_lat
df['brewery_lng'] = brewery_location_lng&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;And view our DataFrame:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;df.head()&lt;/code&gt;&lt;/pre&gt;&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/6-4.png" width="866" height="185" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Let’s sort the values by &lt;span class="inline-code"&gt;brewery_id&lt;/span&gt;  and store our DataFrame as a CSV file without index column and headings:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;df = df.sort_values(by='brewery_id')
df.to_csv('brewery_data.csv', index=False, header=False)&lt;/code&gt;&lt;/pre&gt;&lt;h2&gt;Creating a Clickhouse dictionary&lt;/h2&gt;
&lt;p&gt;You can create Clickouse dictionaries in many different ways. We will try to structure it in an XML file, configure the server files, and access it through our client. The XML file structure will be the following:&lt;/p&gt;
&lt;p class="note"&gt;Learn more about other ways you can create Clickhouse dictionaries &lt;a href="https://clickhouse.tech/docs/ru/engines/table-engines/special/dictionary/"&gt;in the documentation&lt;/a&gt;&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;&amp;lt;yandex&amp;gt;
&amp;lt;dictionary&amp;gt;
        &amp;lt;name&amp;gt;breweries&amp;lt;/name&amp;gt;
        &amp;lt;source&amp;gt;
                &amp;lt;file&amp;gt;
                        &amp;lt;path&amp;gt;/home/ubuntu/brewery_data.csv&amp;lt;/path&amp;gt;
                        &amp;lt;format&amp;gt;CSV&amp;lt;/format&amp;gt;
                &amp;lt;/file&amp;gt;
        &amp;lt;/source&amp;gt;
        &amp;lt;layout&amp;gt;
                &amp;lt;flat /&amp;gt;
        &amp;lt;/layout&amp;gt;
        &amp;lt;structure&amp;gt;
                &amp;lt;id&amp;gt;
                        &amp;lt;name&amp;gt;brewery_id&amp;lt;/name&amp;gt;
                &amp;lt;/id&amp;gt;
                &amp;lt;attribute&amp;gt;
                        &amp;lt;name&amp;gt;beer_count&amp;lt;/name&amp;gt;
                        &amp;lt;type&amp;gt;UInt64&amp;lt;/type&amp;gt;
                        &amp;lt;null_value&amp;gt;Null&amp;lt;/null_value&amp;gt;
                &amp;lt;/attribute&amp;gt;
                &amp;lt;attribute&amp;gt;
                        &amp;lt;name&amp;gt;brewery_name&amp;lt;/name&amp;gt;
                        &amp;lt;type&amp;gt;String&amp;lt;/type&amp;gt;
                        &amp;lt;null_value&amp;gt;Null&amp;lt;/null_value&amp;gt;
                &amp;lt;/attribute&amp;gt;
                &amp;lt;attribute&amp;gt;
                        &amp;lt;name&amp;gt;brewery_slug&amp;lt;/name&amp;gt;
                        &amp;lt;type&amp;gt;String&amp;lt;/type&amp;gt;
                        &amp;lt;null_value&amp;gt;Null&amp;lt;/null_value&amp;gt;
                &amp;lt;/attribute&amp;gt;
                &amp;lt;attribute&amp;gt;
                        &amp;lt;name&amp;gt;brewery_page_url&amp;lt;/name&amp;gt;
                        &amp;lt;type&amp;gt;String&amp;lt;/type&amp;gt;
                        &amp;lt;null_value&amp;gt;Null&amp;lt;/null_value&amp;gt;
                &amp;lt;/attribute&amp;gt;
                &amp;lt;attribute&amp;gt;
                        &amp;lt;name&amp;gt;brewery_city&amp;lt;/name&amp;gt;
                        &amp;lt;type&amp;gt;String&amp;lt;/type&amp;gt;
                        &amp;lt;null_value&amp;gt;Null&amp;lt;/null_value&amp;gt;
                &amp;lt;/attribute&amp;gt;
                &amp;lt;attribute&amp;gt;
                        &amp;lt;name&amp;gt;lat&amp;lt;/name&amp;gt;
                        &amp;lt;type&amp;gt;String&amp;lt;/type&amp;gt;
                        &amp;lt;null_value&amp;gt;Null&amp;lt;/null_value&amp;gt;
                &amp;lt;/attribute&amp;gt;
                &amp;lt;attribute&amp;gt;
                        &amp;lt;name&amp;gt;lng&amp;lt;/name&amp;gt;
                        &amp;lt;type&amp;gt;String&amp;lt;/type&amp;gt;
                        &amp;lt;null_value&amp;gt;Null&amp;lt;/null_value&amp;gt;
                &amp;lt;/attribute&amp;gt;
        &amp;lt;/structure&amp;gt;
        &amp;lt;lifetime&amp;gt;300&amp;lt;/lifetime&amp;gt;
&amp;lt;/dictionary&amp;gt;
&amp;lt;/yandex&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;span class="inline-code"&gt;name&lt;/span&gt; is a dictionary name,  &lt;span class="inline-code"&gt;attribute&lt;/span&gt;  holds the properties of the columns, &lt;span class="inline-code"&gt;id&lt;/span&gt;  is a key field,   &lt;span class="inline-code"&gt;file&lt;/span&gt; stores file path and format. We are going to store our file in this directory: /home/ubuntu.&lt;/p&gt;
&lt;p&gt;Let’s upload our CSV and XML files to the server, it can be done using an FTP like FileZilla. We explained how to deploy Clickhouse on an Amazon instance in our &lt;a href="http://leftjoin.ru/all/stavim-clickhouse-na-aws/"&gt;previous article&lt;/a&gt;, this time need to do the same. Open your FileZilla client and go to SFTP settings to add a private key:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/7-4.png" width="777" height="557" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Connect to your server address, it can be found in the EC2 management console. Specify SFTP as a protocol, your Host, and Ubuntu as a username.&lt;/p&gt;
&lt;p class="note"&gt;Your Public DNS may change in case of overload&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/8-3.png" width="452" height="451" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;After connecting we will wind up in this location /home/ubuntu. Let’s put the files in that folder and connect via SSH using Termius. Then we need to move the files to /etc/clickhouse-server to view them in Clickhouse:&lt;/p&gt;
&lt;p class="note"&gt;Learn how you can connect to an AWS server using SSH client from our previous material &lt;a href="http://leftjoin.ru/all/stavim-clickhouse-na-aws/" class="nu"&gt;“&lt;u&gt;Installing Clickhouse on AWS&lt;/u&gt;”&lt;/a&gt;&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;sudo mv breweries_dictionary.xml /etc/clickhouse server/&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Go to the config file:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;cd /etc/clickhouse-server
sudo nano config.xml&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;We need the &lt;dictionaries_config&gt; tag, it’s the path to a file that describes the dictionaries structure. Specify the path to our XML file:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;&amp;lt;dictionaries_config&amp;gt;/etc/clickhouse-server/breweries_dictionary.xml&amp;lt;/dictionaries_config&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Save our file and run the Clickhouse client:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;clickhouse client&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Let’s check that the dictionary really loaded:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;SELECT * FROM system.dictionaries\G&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In case of success you will get the following:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/9-3.png" width="854" height="344" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Now, let’s write a query with the  &lt;a href="https://clickhouse.tech/docs/ru/sql-reference/functions/ext-dict-functions/#dictget"&gt;dictGet&lt;/a&gt; function to get the name of the brewery with ID 999. Pass in the dictionary name, as the first argument, then the filed name and ID.&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;SELECT dictGet('breweries', 'brewery_name', toUInt64(999))&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;And our query returns this:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/10.png" width="400" height="60" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Similarly, we could use this function to get a beer name, when the table contains only IDs.&lt;/p&gt;
</description>
<pubDate>Tue, 16 Jun 2020 10:25:07 +0300</pubDate>
</item>

<item>
<title>Working with Materialized Views in Clickhouse</title>
<guid isPermaLink="false">26</guid>
<link>https://en.leftjoin.ru/all/working-with-materialized-views-in-clickhouse/</link>
<comments>https://en.leftjoin.ru/all/working-with-materialized-views-in-clickhouse/</comments>
<description>
&lt;p&gt;This time we’ll illustrate how you can pass data on Facebook ad campaigns to Clickhouse tables with Python and implement Materialized Views. What is materialized views, you may ask. Oftentimes Clickhouse is used to handle large amounts of data and the time spent waiting for a response from a table with raw data is constantly increasing. Usually, we would use ETL-process to address this task efficiently or create aggregate tables, which are not that useful because we have to regularly update them. Clickhouse system offers a new way to meet the challenge using materialized views.&lt;br /&gt;
Materialized Views allow us to store and update data on a hard drive in line with the &lt;span class="inline-code"&gt;SELECT&lt;/span&gt; query that was used to get a view. When we need to insert data into a table, the &lt;span class="inline-code"&gt;SELECT&lt;/span&gt; method transforms our data and populates a materialized view.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Setting Up Amazon EC2 instance&lt;/b&gt;&lt;br /&gt;
We need to connect our Python script that we created &lt;a href="https://www.valiotti.com/leftjoin/all/collecting-data-on-facebook-ad-campaigns//"&gt;in this article&lt;/a&gt;  to Cickhouse. The script will make queries, so let’s open several ports. In your AWS Dashboard go to Network &amp; Security — Security Groups. Our instance belongs to the launch-wizard-1 group. Сlick it and pay attention to the Inbound rules, you need to set them as shown in this screenshot:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/1-22.png" width="969" height="293" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;&lt;b&gt;Setting up Clickhouse&lt;/b&gt;&lt;br /&gt;
It’s time to set up Clickhouse. Let’s edit the config.xml file using nano text editor:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;cd /etc/clickhouse-server
sudo nano config.xml&lt;/code&gt;&lt;/pre&gt;&lt;p class="note"&gt; Learn more about &lt;a href="https://linuxize.com/post/how-to-use-nano-text-editor/"&gt;the shortcuts here &lt;/a&gt; if you didn’t get how to exit nano too :)&lt;/p&gt;
&lt;p&gt;Uncomment this line:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;&amp;lt;listen_host&amp;gt;0.0.0.0&amp;lt;/listen_host&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;to access your database from any IP-address:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/2-3.png" width="479" height="194" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;&lt;b&gt;Create a table and its materialized view&lt;/b&gt;&lt;br /&gt;
Open a terminal window to create our database with tables:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;CREATE DATABASE db1
USE db1&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;We’ll refer to the same example of data collection from Facebook. The data on Ad Campaigns may often change and be updated, with this in mind we want to create a materialized view that would automatically update aggregate tables containing the costs data. Our Clickhouse table will look almost the same as the &lt;span class="inline-code"&gt;DataFrame&lt;/span&gt; used in the previous post. We picked &lt;span class="inline-code"&gt;ReplacingMergeTree &lt;/span&gt; as an engine for our table, it will remove duplicates by sorting key:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;CREATE TABLE facebook_insights(
	campaign_id UInt64,
	clicks UInt32,
	spend Float32,
	impressions UInt32,
	date_start Date,
	date_stop	 Date,
	sign Int8
) ENGINE = ReplacingMergeTree
ORDER BY (date_start, date_stop)&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;And then,  create a materialized view:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;CREATE MATERIALIZED VIEW fb_aggregated
ENGINE = SummingMergeTree()
ORDER BY date_start
	AS
	SELECT campaign_id,
		      date_start,
		      sum(spend * sign) as spent,
		      sum(impressions * sign) as impressions,
		      sum(clicks * sign) as clicks
	FROM facebook_insights
	GROUP BY date_start, campaign_id&lt;/code&gt;&lt;/pre&gt;&lt;p class="note"&gt;More details are available in &lt;a href="https://clickhouse.yandex/blog/en/how-to-update-data-in-clickhouse"&gt;the Clickhouse blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Unfortunately for us, Clikhouse system doesn’t include a familiar &lt;span class="inline-code"&gt;UPDATE&lt;/span&gt; method. So we need to find a workaround. Thanks to the Yandex team, these guys offered to insert rows with a negative &lt;span class="inline-code"&gt;sign&lt;/span&gt; first,  and then use &lt;span class="inline-code"&gt;sign&lt;/span&gt; for reversing. According to this principle, the old data will be ignored when summing.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Script&lt;/b&gt;&lt;br /&gt;
Let’s start writing the script and import a new library, which is called clickhouse_driver. It allows to make queries to Clickhouse in Python:&lt;/p&gt;
&lt;p class="note"&gt;We are using the updated version of the script from &lt;a href="https://www.valiotti.com/leftjoin/all/collecting-data-on-facebook-ad-campaigns/"&gt; “Collecting Data on Facebook Ad Campaigns”&lt;/a&gt;. But it will work fine if you just combine this code with the previous one.&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;from datetime import datetime, timedelta
from clickhouse_driver import Client
from clickhouse_driver import errors&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;An object of the &lt;span class="inline-code"&gt;Client&lt;/span&gt;class enables us to make queries with the &lt;span class="inline-code"&gt;execute()&lt;/span&gt; method. Type in your public DNS in the &lt;span class="inline-code"&gt;host&lt;/span&gt; field,  &lt;span class="inline-code"&gt;port&lt;/span&gt; – 9000, specify default as a &lt;span class="inline-code"&gt;user&lt;/span&gt;, and a &lt;span class="inline-code"&gt;database&lt;/span&gt;for the connection.&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;client = Client(host='ec1-2-34-56-78.us-east-2.compute.amazonaws.com', user='default', password=' ', port='9000', database='db1')&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;To ensure that everything works as expected, we need to write the following query that will print out names of all databases stored on the server:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;client.execute('SHOW DATABASES')&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In case of success the query will return this list:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;[('_temporary_and_external_tables',), ('db1',), ('default',), ('system',)]&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For example, we want to get data for the past three days. Create several &lt;span class="inline-code"&gt;datetime objects &lt;/span&gt; with the datetime library and convert them to strings using the&lt;span class="inline-code"&gt;strftime()&lt;/span&gt;&lt;br /&gt;
method:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;date_start = datetime.now() - timedelta(days=3)
date_end = datetime.now() - timedelta(days=1)
date_start_str = date_start.strftime(&amp;quot;%Y-%m-%d&amp;quot;)
date_end_str = date_end.strftime(&amp;quot;%Y-%m-%d&amp;quot;)&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This query returns all table columns for a certain period:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;SQL_select = f&amp;quot;select campaign_id, clicks, spend, impressions, date_start, date_stop, sign from facebook_insights where date_start &amp;gt; '{date_start_str}' AND date_start &amp;lt; '{date_end_str}'&amp;quot;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Make a query and pass the data to the &lt;span class="inline-code"&gt;old_data_list&lt;/span&gt;.  And then, replace their &lt;span class="inline-code"&gt;sign&lt;/span&gt;for -1 and append elements to the &lt;span class="inline-code"&gt;new_data_list&lt;/span&gt;:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;new_data_list = []
old_data_list = []
old_data_list = client.execute(SQL_select)

for elem in old_data_list:
    elem = list(elem)
    elem[len(elem) - 1] = -1
    new_data_list.append(elem)&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Finally,  write our algorithm: insert the data with the &lt;span class="inline-code"&gt;sign =-1&lt;/span&gt;, optimize it with &lt;span class="inline-code"&gt;ReplacingMergeTree&lt;/span&gt;, remove duplicates, and &lt;span class="inline-code"&gt;INSERT&lt;/span&gt; new data with the &lt;span class="inline-code"&gt;sign = 1&lt;/span&gt;.&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;SQL_query = 'INSERT INTO facebook_insights VALUES'
client.execute(SQL_query, new_data_list)
SQL_optimize = &amp;quot;OPTIMIZE TABLE facebook_insights&amp;quot;
client.execute(SQL_optimize)
for i in range(len(insight_campaign_id_list)):
    client.execute(SQL_query, [[insight_campaign_id_list[i],
                                insight_clicks_list[i],
                                insight_spend_list[i],
                                insight_impressions_list[i],
                                datetime.strptime(insight_date_start_list[i], '%Y-%m-%d').date(),
                                datetime.strptime(insight_date_start_list[i], '%Y-%m-%d').date(),
                                1]])
    client.execute(SQL_optimize)&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Get back to Clickhouse and make the next query to view the first 20 rows:&lt;br /&gt;
&lt;span class="inline-code"&gt;SELECT * FROM facebook_insights LIMIT 20&lt;/span&gt;&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/3-4.png" width="754" height="375" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;And  &lt;span class="inline-code"&gt;SELECT * FROM fb_aggregated LIMIT 20&lt;/span&gt; to compare our materialized view:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/4-4.png" width="748" height="385" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Nice work! Now we have a materialized view that will be updated each time when the data in the &lt;span class="inline-code"&gt;facebook_insights &lt;/span&gt; table changes. The trick with the &lt;span class="inline-code"&gt;sign&lt;/span&gt; operator allows to differ already processed data and prevent its summation, while &lt;span class="inline-code"&gt;ReplacingMergeTree &lt;/span&gt;engine helps us to remove duplicates.&lt;/p&gt;
</description>
<pubDate>Wed, 20 May 2020 15:06:48 +0300</pubDate>
</item>

<item>
<title>Installing Clickhouse on AWS</title>
<guid isPermaLink="false">25</guid>
<link>https://en.leftjoin.ru/all/installing-clickhouse-on-aws/</link>
<comments>https://en.leftjoin.ru/all/installing-clickhouse-on-aws/</comments>
<description>
&lt;p&gt;In today’s article, we’ll work with Clickhouse and install it on a free Amazon EC2 instance.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;AWS account and Ubuntu Instance&lt;/b&gt;&lt;br /&gt;
The easiest way to install Clickouse on a virtual Ubuntu server is to use .deb packages. There is no need to worry if you don’t have one – Amazon Web Services provide Free Tier offers that you can enjoy for 12 months. Just go to &lt;a href="https://aws.amazon.com"&gt;https://aws.amazon.com&lt;/a&gt; and sign up.&lt;br /&gt;
Once registered, go to your Dashboard, find the “Build a solution” option and click «Launch a virtual machine», and choose one that comes with Ubuntu Server pre-installed.&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;div class="fotorama" data-width="730" data-ratio="2.5"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/1-2.png" width="730" height="292" alt="" /&gt;
&lt;img src="https://en.leftjoin.ru/pictures/2-2.png" width="986" height="146" alt="" /&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Create a key pair – one is a public key and another is a private key that you need to store locally, it secures our connection.&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/3-3.png" width="683" height="500" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;After this step, we’ll see the EC2 Management Console with our EC2 instance up and running. It has a public DNS that we need to save.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Connect with Termius&lt;/b&gt;&lt;br /&gt;
We connect to our virtual server via SSH protocol. The majority of clients support this protocol, and for our case, we’ll be using Termius. Click «+ NEW HOST» and complete the fields.&lt;br /&gt;
Type your public DNS in the address field,  «ubuntu» as a Username and leave the password field empty. Now,  in order to complete the Key field, we need to specify a file with the .pem extension, the one that was received after creating an Instance. Your result should be much the same:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/4-3.png" width="451" height="419" alt="" /&gt;
&lt;/div&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/5-2.png" width="450" height="673" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Connect to our Instance after authentication and we’ll get a new console screen:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/6-2.png" width="677" height="403" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Now we can install Clickhouse. Run the following command to add the Clickhouse repository:&lt;/p&gt;
&lt;p class="note"&gt;Learn more about other ways you can install Clickhouse in the &lt;a href="https://clickhouse.tech/docs/en/getting-started/install/"&gt;documentation&lt;/a&gt;&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;echo &amp;quot;deb http://repo.yandex.ru/clickhouse/deb/stable/ main/&amp;quot; | sudo tee /etc/apt/sources.list.d/clickhouse.list&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Make sure to update the packages:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;sudo apt-get update&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Finally, install our client and server by running:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;sudo apt-get install -y clickhouse-server clickhouse-client&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;And it’s done! The client and Clickhouse server were installed on our instance. Run the server:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;sudo service clickhouse-server start&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Test our Clickhouse server to ensure that everything works:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;sudo service clickhouse-server status:&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;And if everything works fine, we’ll get the following output:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/7-2.png" width="881" height="125" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Type in the next command to connect to our client:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;clickhouse client&lt;/code&gt;&lt;/pre&gt;&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/8-2.png" width="505" height="122" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;Run another check as suggested in the documentation:&lt;/p&gt;
&lt;pre class="e2-text-code"&gt;&lt;code&gt;SELECT 1&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If everything was done right we’ll get the following:&lt;/p&gt;
&lt;div class="e2-text-picture"&gt;
&lt;img src="https://en.leftjoin.ru/pictures/9-2.png" width="455" height="218" alt="" /&gt;
&lt;/div&gt;
&lt;p&gt;This is it! Next time we’ll share how to work with Python and  Clickhouse,  return to our script that retrieves data on Ad Campaigns and push it into a table to visualize after.&lt;/p&gt;
</description>
<pubDate>Mon, 18 May 2020 14:28:02 +0300</pubDate>
</item>


</channel>
</rss>