How to connect Plotly Dash to a SQL database

Hattie
5 min readMay 13, 2021

There are lots of brilliant tutorials explaining how to build a Plotly Dash dashboard whilst connecting to a static Pandas DataFrame, but very few explaining how to connect to a SQL database (for example, Bigquery, Snowflake, AWS). By connecting directly to the database we can enable the end user to become more self-sufficient and pull the data they require as and when they need.

In this example we are going to connect JupyterDash to a Bigquery database, specifically the StackOverflow dataset. If you don’t have a Bigquery account then you can follow these instructions which show you how to connect to Bigquery & its sample datasets.

This is the final result of this blog. It’s pretty basic since we’re mainly focusing on connecting Dash to a database:

Important thing to note:

Note that there will be a charge to running this Dashboard in BigQuery. BigQuery charges $5 per TB scanned (although you get $300 free credit if you’re new to BigQuery). Querying the whole StackOverflow dataset queries 25GB of data each time, which is about 12 cents each run. You can create a smaller sample of the dataset as per the instructions above, however in this blog I query the entire dataset (25GB).

Assumed knowledge:

It’s beyond the scope of this article to explain all the ongoings of Dash in the code, it’s therefore assumed you have some basic Dash knowledge. Click here for some excellent Plotly Dash tutorials including a similar video version of this blog.

Start here

Firstly you want to import your libraries

from jupyter_dash import JupyterDash
import pydata_google_auth
from google.cloud import bigquery
import dash_core_components as dcc
import dash_html_components as html
import dash_bootstrap_components as dbc
from dash.dependencies import Output, Input, State
import plotly.express as px
import pandas as pd

Next you want to authorise your BigQuery account. The get_user_credentials function will ask you to login to your BigQuery account & these details are saved so you won’t need to login again.

credentials = pydata_google_auth.get_user_credentials(['https://www.googleapis.com/auth/bigquery'],)

This line of code tells BigQuery which Project you want to connect to, so add your BigQuery project name here:

bigquery_client = bigquery.Client(project='your-bigquery-project-name')

Here we’re connecting to an external CSS stylesheet which sets the look of your Dash app. We can create our own CSS stylesheet, but ain’t nobody got time for that.

external_stylesheets = [‘https://codepen.io/chriddyp/pen/bWLwgP.css']

Next you want to initialize a folder with a sample app (app.py) and deploy it to a server:

app = JupyterDash(external_stylesheets=[dbc.themes.BOOTSTRAP])
server = app.server

Then we want to describe what we want the layout of the app to look like. Note that we are specififying the min and max dates that exist within our BigQuery table:

app.layout = html.Div([
html.H1("Keyword Analysis on Stack Overflow Posts", style={'textAlign':'center'}),
html.P(children="Insert your keyword and dates and click 'Submit'",style={'textAlign':'center'}),
html.Br(),
dcc.Input(id='text', placeholder="Insert keyword or phrase"),
dcc.DatePickerRange(id='date-point',
min_date_allowed=date(2015, 1, 1),
max_date_allowed=date(2015, 12, 31),
start_date=date(2015, 1, 1),
end_date=date(2015, 1, 31)
),
html.Button(id='enter', children=['Submit']),
html.Div(id='graph-content'),
])

Next we want to define the callback function. This is where the ‘inputs’ and ‘outputs’ of our app’s interface are described. In our case it’s text, start_date & end_date:

@app.callback(
Output('graph-content', 'children'),

[Input('enter','n_clicks'),
State('text', 'value'),
State('date-point', 'start_date'),
State('date-point', 'end_date')]
)
def create_graph(n, text, startdate, enddate):
print(text)
print(startdate)
print(enddate)

Now for the main part — we want to call our SQL from BigQuery by creating a formatted text object containing the SQL. Within this SQL we include options (text, start_date, end_date) that the end-user can change on the dashboard. Then we call this object using the read_gbq function and assign it to a Pandas dataframe. I also performed some additional formatting on the ‘body’ column:

df_sql =  f""" SELECT 
date(creation_date) as date,
title,
body ,
tags,
count(distinct id) as num_posts
FROM `bigquery-public-data.stackoverflow.stackoverflow_posts`
WHERE REGEXP_CONTAINS(body,'(?i)({text})')
and date(creation_date) >= '{startdate}'
and date(creation_date) <= '{enddate}'
group by 1,2,3,4"""
project_id = 'your-project-id' df= pd.read_gbq(df_sql, project_id=project_id, dialect='standard', credentials=credentials)
df.columns = df.columns.str.lower()
df['body'] = df['body'].str.replace('<[^<]+?>', '')
df_filtered = df

Next we create the first of 2 Plotly graphs. Here I’m creating a datatable:

#FIGURE 1fig_1_df = df_filtered[['title','body']]
fig_1_df.reset_index(inplace = True)
fig_1 = go.Figure(data=[go.Table(
header=dict(values=list(fig_1_df[['title','body']].drop_duplicates()),
fill_color='paleturquoise',
align='left'),
cells=dict(values=[fig_1_df.title,fig_1_df.body],
fill_color='light blue',
align='left'))
])
fig_1.update_layout(
autosize=True,
title = "Stackoverflow posts that contain your chosen keyword")

Which looks like this:

Next we create the second Plotly graph — a line graph:

#FIGURE 2fig_2_df = df_filtered.groupby(['date']).agg({'body':'nunique'})
fig_2_df.reset_index(inplace = True)
fig_2_df.columns = ['date', 'messages_containing_keyword']
fig_2_df['date'] = pd.to_datetime(fig_2_df['date'])
fig_2 = px.line(fig_2_df, x='date' , y='messages_containing_keyword')
fig_2.update_layout(
autosize=True,
title = "Stackoverflow posts that contain your chosen keyword by date")

Which looks like this:

Next we complete the function by calling the dcc.Graph objects:

return dcc.Graph(figure=fig_1), dcc.Graph(figure=fig_2)

Finally we call the app:

if __name__==’__main__’:
app.run_server(debug=True)

Now let’s put all the code together:

from jupyter_dash import JupyterDash
import pydata_google_auth
from google.cloud import bigquery
import dash_core_components as dcc
import dash_html_components as html
import dash_bootstrap_components as dbc
from dash.dependencies import Output, Input, State
import plotly.express as px
import pandas as pd

credentials = pydata_google_auth.get_user_credentials(['https://www.googleapis.com/auth/bigquery'],)
bigquery_client = bigquery.Client(project='your-project-id')
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']app = JupyterDash(external_stylesheets=[dbc.themes.BOOTSTRAP])
server = app.server
############## APP LAYOUT ###################app.layout = html.Div([
html.H1("Keyword Analysis on Stack Overflow Posts", style={'textAlign':'center'}),#style={'textAlign':'center'}
html.P(children="Insert your keyword and dates and click 'Submit'",style={'textAlign':'center'}),
html.Br(),
dcc.Input(id='text', placeholder="Insert keyword or phrase"),
dcc.DatePickerRange(id='date-point',
min_date_allowed=date(2015, 1, 1),
max_date_allowed=date(2015, 12, 31),
start_date=date(2015, 1, 1),
end_date=date(2015, 1, 31)
),
html.Button(id='enter', children=['Submit']),
html.Div(id='graph-content'),
])############## CALLBACK ###################@app.callback(
Output('graph-content', 'children'),

[Input('enter','n_clicks'),
State('text', 'value'),
State('date-point', 'start_date'),
State('date-point', 'end_date')]
)
def create_graph(n, text, startdate, enddate):
print(text)
print(startdate)
print(enddate)


############## SQL ###################
df_sql = f""" SELECT
date(creation_date) as date,
title,
body ,
tags,
count(distinct id) as num_posts
FROM `bigquery-public-data.stackoverflow.stackoverflow_posts`
WHERE REGEXP_CONTAINS(body,'(?i)({text})')
and date(creation_date) >= '{startdate}'
and date(creation_date) <= '{enddate}'
group by 1,2,3,4"""
project_id = 'your-project-id'
df= pd.read_gbq(df_sql, project_id=project_id, dialect='standard', credentials=credentials)
df.columns= df.columns.str.lower()
df['body'] = df['body'].str.replace('<[^<]+?>', '')#.astype('int')
df_filtered =df
############## GRAPHS ####################FIGURE 1 fig_1_df = df_filtered[['title','body']]
fig_1_df.reset_index(inplace = True)
fig_1 = go.Figure(data=[go.Table(
header=dict(values=list(fig_1_df[['title','body']].drop_duplicates()),
fill_color='paleturquoise',
align='left'),
cells=dict(values=[fig_1_df.title,fig_1_df.body],
fill_color='light blue',
align='left'))
])
fig_1.update_layout(
autosize=True,
title = "Stackoverflow posts that contain your chosen keyword")
#FIGURE 2 fig_2_df = df_filtered.groupby(['date']).agg({'body':'nunique'})
fig_2_df.reset_index(inplace = True)
fig_2_df.columns = ['date', 'messages_containing_keyword']
fig_2_df['date'] = pd.to_datetime(fig_2_df['date'])
fig_2 = px.line(fig_2_df, x='date' , y='messages_containing_keyword')
fig_2.update_layout(
autosize=True,
title = "Stackoverflow posts that contain your chosen keyword by date")
return dcc.Graph(figure=fig_1), dcc.Graph(figure=fig_2)if __name__=='__main__':
app.run_server(debug=True)

I hope you found this helpful. Please message me with any questions. Happy coding!

--

--

Hattie

Background in data science, analytics & engineering. Writing to acquire understanding and clarity