How and why should you export reports from Jupyter Notebook to PDF
⏱ Время чтения текста – 5 минутIf you are a data analyst and you need to present a report to a client, if you are looking for a job and do not know how to draw up a test task in such a way that people will pay attention to you, if you have a lot of educational projects related to data analytics and visualization, this post will be very, very useful to you.
Looking at someone else’s code in a Jupyter Notebook can be problematic, because the result is often lost between lines of code with data preparation, importing the necessary libraries and a series of attempts to implement the idea. That is why a method such as exporting results to a PDF file in LaTeX format is a great option for final visualization. It will save time and look presentable. In scientific circles, articles and reports are very often formatted using LaTeX, since it has a number of advantages:
- Math equations and formulas look neater.
- The bibliography is automatically generated based on all references used in the document.
- The author can focus on the content (not on the appearance of the document), since the layout of the text and other data is set automatically by specifying the necessary parameters in the code.
Today we will talk in detail about how to export such beautiful reports from Jupyter Notebook to PDF using LaTeX.
Installing LaTeX
The most important point in generating a report from a Jupyter Notebook in Python is exporting it to the final file. The main library you need to install is – nbconvert – which converts your notebook into any convenient document format: pdf (as in our case), html, latex, etc. This library needs not only to be installed, but some preinstalling of several other packages as well: Pandoc, TeX, and Chromium. According to the link to the library, the whole process is described in detail for each software, so we will not dwell on it here.
Once you have completed all the preliminary steps, you need to install and import the library into your Jupyter Notebook.
! pip install nbconvert
import nbconvert
Export tables to Markdown format
Usually, tables look a bit odd in reports, as they can be difficult to read quickly, but sometimes it is still necessary to add a small table to the final document. In order for the table to look neat, you need to save it in Markdown format. This can be done manually, but if there is a lot of data in the table, it is better to come up with a more convenient method. We suggest using the following simple pandas_df_to_markdown_table () function, which converts any dataframe to a markdown-table. Note: after the conversion, indices disappear, therefore, if they are important (as in our example), it is worth saving them into a variable in the first column of the dataframe.
data_g = px.data.gapminder ()
summary = round (data_g.describe (), 2)
summary.insert (0, 'metric', summary.index)
# Function to convert dataframe to Markdown Table
def pandas_df_to_markdown_table (df):
from IPython.display import Markdown, display
fmt = ['---' for i in range (len (df.columns))]
df_fmt = pd.DataFrame ([fmt], columns = df.columns)
df_formatted = pd.concat ([df_fmt, df])
display (Markdown (df_formatted.to_csv (sep = "|", index = False)))
pandas_df_to_markdown_table (summary)
Export image to report
In this example, we will build a bubble-chart, the construction method of which was described in a recent post. Previously we used the Seaborn library, which shown that the display of data with the size of circles on the graph is correct. The same graphs can be created using the Plotly library.
In order to display the plot in the final report, you also need to complete an additional step. The point is that plt.show () will not help to display the graph when exporting. Therefore, you need to save the graph in the working directory, and then, using the iPython.display library, display it using the Image () function.
from IPython.display import Image
import plotly.express as px
fig = px.scatter (data_g.query ("year == 2007"), x = "gdpPercap", y = "lifeExp",
size = "pop", color = "continent",
log_x = True, size_max = 70)
fig.write_image ('figure_1.jpg')
Image (data = 'figure_1.jpg', width = 1000)
Formation and export of the report
When all stages of data analysis are completed, the report can be exported. If you need headings or text in the report, then write them in the cells of the notebook, changing the format from Code to Markdown. For export, you can use the terminal, running the second line there without an exclamation mark, or you can run the code written below in the cell of the Jupiter Notebook. We advise you not to load the report with code and use TemplateExporter.exclude_input = True parameter so that the cells with the code are not exported. Also, when you run this cell in your notebook, the code produces a standard output, and you need to write %% capture at the beginning of the cell not to export it.
%% capture
! jupyter nbconvert --to pdf --TemplateExporter.exclude_input = True ~ / Desktop / VALIOTTI / Reports / Sample \ LaTeX \ Report.ipynb
! open ~ / Desktop / VALIOTTI / Reports / Sample \ LaTeX \ Report.pdf
If you did everything correctly and methodically, then you will end up with a report similar to this one!
Present your data nicely :)