Hey y'all! With the presidential election heating up, I thought it would be great to build a Presidential Candidate Polling Dashboard. This project will hone in on your web scraping, web development and styling skills to build a dashboard with daily feed of latest polls for the top democratic and republican candidates.

Here is a roadmap for this project:

  1. We’ll use Beautiful Soup in Python to scrape poll information from https://projects.fivethirtyeight.com/polls/president-general/ .
  2. Then we’ll create a HTML web page to display this poll information and a server using Python Flask to serve this page.
  3. Then we’ll style this page using CSS.

Before we begin, I want to mention that the tutorial below can be found in a video form on our website. You can find more free courses and projects on my website, The Codex to master Python by building projects. You can find all the code for this project at my GitHub Repo here.  This tutorial was made by Darren Zhang.


Here is what our final project looks like:


Installation

Making sure you have Python3 and pip installed.

We will be using Chrome in our example, so make sure you have it installed.

Chrome download page: https://www.google.com/chrome/

Chrome driver binary: https://sites.google.com/a/chromium.org/chromedriver/downloads

Let’s install the selenium package: pip install beautifulsoup4

Now, let’s begin coding!

Web Scraping With BeautifulSoup:

First, you’ll want to get the site’s HTML code into your Python script so that you can interact with it. For this task, you’ll use Python’s requests library. Type the following in your terminal to install it:


Next open up a new file in your editor called data.py. Retrieving the html from the website is quite simple:

This code performs an HTTP request to the given URL. It retrieves the HTML data that the server sends back and stores that data in a Python object.

Beautiful Soup is a Python library for parsing structured data. It allows you to interact with HTML in a similar way to how you would interact with a web page using developer tools. Beautiful Soup exposes a couple of intuitive functions you can use to explore the HTML you received. To get started, use your terminal to install the Beautiful Soup library:

from bs4 import BeautifulSoup
import requests
URL = 'https://projects.fivethirtyeight.com/polls/president-general/'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

Here, we’re creating a BeautifulSoup object that parses the html page content we retrieved earlier.

Inspecting the page

Let’s inspect the page by clicking on the cursor icon to the left of elements:

Let’s hover over a random date. You’ll notice that each date on this website has a <td> tag and a date-wrapper class. Likewise, the pollster name is inside a class called pollster-container, the sample size is inside a class called sample, and the same goes for answers and pollmap, all of which have <td> tags.


All of these <td> tags, which are data columns are children of the row <tr>.


We can use a for-loop to go through all of the rows and display the information that we want.

Locating elements

Locating data on the webpage can easily be done using BeautifulSoup

To get an array of table rows, we do the following:
rows = soup.find_all(class_='visible-row')
for r in rows:
    print(r, '\n')

Here, we use the .find_all() method on the Beautiful Soup object, which returns an iterable containing html for all the rows with class name ‘visible-row’.

Neat! Now our goal is to get the data inside each <td> tag. From our inspection earlier, we know that each date has a date-wrapper class.

rows = soup.find_all(class_='visible-row')
for r in rows:
    date = r.find(class_='date-wrapper')
    print(date, '\n')

To access the date inside of an element, use element.text.

rows = soup.find_all(class_='visible-row')
for r in rows:
    date = r.find(class_='date-wrapper').text
    print(date, '\n')


Note: Your dates may be different.

Now for a challenge: Retrieve and print the dates, pollsters, samples, and results of each row on your own.

Pollsters:

Our first step would be to inspect the pollster. Each pollster can have two links, one for the grade and one for the pollster name.

However, there are cases when grade isn’t available. In that case, there is only one link.


In both cases, we want the last link in the array of links.

Here is the solution:

rows = soup.find_all(class_='visible-row')
for r in rows:
	date = r.find(class_='date-wrapper').text
    # The pollster container can have up to two links. One for the grade, if it exists,
    # and one for the pollster name. We want the latter.
    pollster_container = r.find(class_="pollster-container")
    pollster_links = pollster_container.find_all("a")
    pollster_name = pollster_links[-1].text # Accesses last element of the link array
    print(pollster_name)

Here are some of the outputs:


Sample size, leader, and net are pretty straight forward:

rows = soup.find_all(class_='visible-row')
for r in rows:
    date = r.find(class_='date-wrapper').text
    # The pollster container can have up to two links. One for the grade, if it exists,
    # and one for the pollster name. We want the latter.
    pollster_container = r.find(class_="pollster-container")
    pollster_links = pollster_container.find_all("a")
    pollster_name = pollster_links[-1].text # Accesses last element of the link array
    sample_size = r.find(class_="sample").text
    leader = r.find(class_="leader").text
    net = r.find(class_="net").text
    print(sample_size, leader, net, '\n')

Let’s take a look at results.

It seems that most of the candidate percent favorability is a text node inside a div with the heat-map class, which is inside a td with class value. Furthermore, for most rows, there are two candidates (Trump and Biden).

Let’s code this up inside the for loop:

values = r.find_all(class_="value")
print('values', len(values), '\n')


And we get:

It seems that some rows only have one value. This is because for rows where there are more than two candidates, the other candidates are hidden away. Inspecting this further, it seems that these candidates are in a separate row with a class name of “expandable-row”.


We can use XPath for this:

Now let’s get hold of the value class:

# Getting the percent favorable for Trump and Biden
values = r.find_all(class_="value")
# If the other value is hidden by the "more" button
if len(values) == 1:
    next_sibling = r.findNext("tr")
    value = next_sibling.find(class_="value")
    values.append(value)

And we have:

Great! Now the values array for each row should contain two elements, one of which is the div with the Trump percent favorable and the other one is a div with the Biden percent favorable. Now let’s get the actual numbers:

# Getting the percent favorable for Trump and Biden
values = r.find_all(class_="value")
# If the other value is hidden by the "more" button
if len(values) == 1:
    next_sibling = r.findNext("tr")
    value = next_sibling.find(class_="value")
    values.append(value)
    trump_fav = values[0].find(class_="heat-map").text
    biden_fav = values[1].find(class_="heat-map").text
    print('trump: ', trump_fav, 'biden: ', biden_fav, '\n')


Now, let’s put all of this code into a function and initialize an array at the beginning, which we will use to store each of the rows.

def scrape_poll_data(): #ADDED THIS
    pollster_data_array = [] #ADDED THIS

At the end of the function, let’s store all of the data we obtained into an object called pollster_data, and append that object to the array. And then we return the array:

pollster_data = {
    "date": date,
    "pollster_name": pollster_name,
    "sample_size": sample_size,
    "leader": leader,
    "net": net,
    "trump_fav": trump_fav,
    "biden_fav": biden_fav
}
pollster_data_array.append(pollster_data)
return pollster_data_array


Here is what the final code should look like: https://github.com/darrenzhang2000/presidential-candidate-polling/tree/branch4

Building Our Web Page With Flask:

In this section, we will create our flask application to serve our home page, which will display the general election information about Biden and Trump in a table format.

Now that we have all the data we need, let’s build our web app! We will be using Flask. Start by running pip install Flask in the terminal. At the top of app.py, write the following:

from flask import Flask
app = Flask(__name__) #four underscores total
if __name__ == '__main__':
	app.run()

Here, we’re importing Flask from the flask library and creating an app, which is an instance of the Flask class. The if __name__ == ‘__main__’ part just makes sure that the app only runs once.

We can create our first route by doing the following:

from flask import Flask, render_template
app = Flask(__name__) #four underscores total
@app.route('/')
def presidential_poll_dashboard():
return 'Hello, world!'
if __name__ == '__main__':
    app.run()

Typing python app.py gives us the following:

By following that link, or going to http://localhost:5000/, we get:

Great! Our route is working. Now we want to create our home page. Create a directory called templates, and create a file called home.html

Copy and paste the basic html boilerplate:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>
</head>
<body>
</body>
</html>

If you’re using VSCode, you can just type ! and have it automatically generate the boilerplate for you.

Let’s add the title and table header:

<body>
<h1>Presidential Candidate Polling Dashboard</h1>
<table>
<tr>
<th>Dates</th>
<th>Pollster</th>
<th>Sample</th>
<th>Result</th>
<th>Net</th>
</tr>
</table>
</body>


To render this page, we need to import render_template and pass in ‘home.html’.

from flask import Flask, render_template
from data import scrape_poll_data
app = Flask(__name__) #four underscores total

@app.route('/')
def presidential_poll_dashboard():
	pollster_data_array = scrape_poll_data()
	return render_template('home.html')

if __name__ == '__main__':
	app.run()

This should give us a minimalistic title and table heading:

Now we need to call the scrape_poll_data() function we defined earlier. To do that, we must import scrape_poll_data from data and pass that array to home.html.

from flask import Flask, render_template
from data import scrape_poll_data
app = Flask(__name__) #four underscores total
@app.route('/')
def presidential_poll_dashboard():
pollster_data_array = scrape_poll_data()
# The pollster_data_array on the left is a variable which can be accessed inside home.html
# The pollster_data_array on the right is the array created by the scrape_poll_data function
return render_template('home.html', pollster_data_array = pollster_data_array)
if __name__ == '__main__':
app.run()

Our next step is to loop through this array of pollster_data and display the data for each row in home.html. However, this can’t be done using html alone. We will use jinja2. Install ninja2 by doing the following pip install jinja2. As for the code:

<body>
<h1>Presidential Candidate Polling Dashboard</h1>
<table>
    <tr>
        <th>Dates</th>
        <th>Pollster</th>
        <th>Sample</th>
        <th>Result</th>
        <th>Net</th>
    </tr>
    {% for row in pollster_data_array %}
        <tr>
            <td>{{ row.date }}</td>
            <td>{{ row.pollster_name }}</td>
            <td>{{ row.sample_size }}</td>
            <td>Trump: {{ row.trump_fav }} Biden: {{ row.biden_fav }}</td>
            <td>{{ row.leader }} {{ row.net }} </td>
        </tr>
    {% endfor %}
</table>
</body>


{% ... %} delimiters are for statements and delimiters are for {{ ... }} expressions to print out. More information about Jinja delimiters can be found here: https://jinja.palletsprojects.com/en/2.11.x/templates/#synopsis

This gives us a webpage with absolutely no styling. We will proceed with styling this page in the next section.


The code up to this point can be found in branch2 of my github: https://github.com/darrenzhang2000/presidential-candidate-polling/tree/branch2

Styling Our Home Page

Created a directory called static. Inside this directory, create a directory called styles. Inside styles, create a file named home.css

Add a reference to home.css by adding the following:

<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" type="text/css" href="{{ url_for('static', filename='styles/home.css') }}">
<title>Presidential Candidate Polling Dashboard</title>
</head>

Let’s add some css to our code:

https://github.com/darrenzhang2000/presidential-candidate-polling/blob/master/static/styles/home.css

And here is our final webpage:


That's it folks! You just built a Presidential Candidate Polling Tutorial with Flask and Python. You can find all the code for this project at our GitHub Repo here. As always, if you have face any troubles building this project, join our discord and The Codex community can help!


For those of you interested in more project walkthroughs: Every Tuesday, I release a new Python/Data Science Project tutorial. I was honestly just tired of watching webcasted lectures and YouTube videos of instructors droning on with robotic voices teaching pure theory, so I started recording my own fun and practical projects.

Want to get notified every time a new project launches?

Subscribe to get Tinker Tuesday delivered to your inbox.

    No spam. Just 1 email / project. Unsubscribe at any time.