Another Tuesday, another free project tutorial. Today, we'll be building a sentiment analysis tool for stock trading headlines. This project will let you hone in on your web scraping, data analysis and manipulation, and visualization skills to build a complete sentiment analysis tool.
Here's a roadmap for today's project:
- We'll use Beautifulsoup in Python to scrape article headlines from FinViz
- Then, we'll use Pandas (Python Data Analysis Library) to analyze and run sentiment analysis on the article headlines
- Finally, we'll use Matplotlib for visualization of our results
Before we begin, I want to mention that the guide below is an abridged version of the free video tutorial which you can find here. You can find more free courses and projects on my website, TheCodex to learn how to design and build applications. You can find all the code for this project at my GitHub Repo here.
Step 1: Gathering and Parsing FinViz Data
FinViz is a free website that makes stock data easily accessible to traders and investors. We'll gather the stock data from FinViz for a specific stock ticker. For example, here's what the webpage for AMZN's ticker on FinViz looks like:
If you scroll down, you'll see the stock articles that we're trying to parse. View Source shows the exact HTML code that contains the stock article name and the date it was published:
Let's go ahead and write some BeautifulSoup code to save this Article Table into a dataset. In a new Python Project, create a file main.py and write the following code:
Make sure you have all of the above modules installed via pip, if you're stuck on installation definitely watch the step-by-step video series available for this project on TheCodex. (For installing NLTK vader, take a look at this helpful SO link)
Here, we've created an array of tickers, and for each one we create the completed FinViz url to parse the data from. Using the Request module in Python, we get the html response from the website and throw that into BeautifulSoup so that we can easily parse it. The HTML element with id 'news-table' contains all of our news articles, so we're saving that BeautifulSoup element to a dictionary. Let's parse that dictionary right now:
Our parsing code simply manipulates the news-table we saved while gathering the results and parses out the specific values we need. We're looking for all the table rows in the table of news articles, and gathering the title, date and time of each article published. Once we have those values, we can save each piece of data as an array object to our parsed_data array.
In the last 2 lines, we convert our parsed_data array to a Pandas dataframe and set the Date column to be of the Python Datetime format. This will allow us to easily apply sentiment analysis and visualize the data with MatPlotLib.
Step 2: Applying Sentiment Analysis
Applying sentiment analysis on the titles is actually the easiest part of the entire project. With NLTK (Natural Language Toolkit) comes a beautiful submodule called vader that allows us to pass in a string into its function and get back a funky looking result like this:
We can see that the string "Very bad movie." gives back the response of 4 variables, compound, negative, neutral and positive. The Compound result is a range between -1 to 1, with -1 being overwhelmingly negative and +1 being respectively positive. This will be the result from which we deduce if a stock article is positive or negative.
Let's go ahead and apply the sentiment analysis on our data frame:
We initialize the SentimentIntensityAnalyzer, and then create a lambda function that takes in a title string, applies the vader.polarity_scores() function on it to get the results in the above image and then only return back the compound score. Using the apply function in Pandas, we can create a new 'compound' column in the data frame with all the compound scores from each title.
Step 3: Visualizing the Results in MatPlotLib
Last but not the least, we need to visualize this data frame in MatPlotLib to see how our Stocks fared every day from public perception in news articles.
Let's visualize the results in a bar chart, by grouping the data based on the tickers and dates:
plt.figure(figsize=(10,8)) mean_df = df.groupby(['ticker', 'date']).mean().unstack() mean_df = mean_df.xs('compound', axis="columns").transpose() mean_df.plot(kind='bar') plt.show()
The above visualization code is grouping our dataset based on the ticker and dates of each row, and then visualizing the average compound score of each day. We take the cross section of the 'compound' rows, flip the data frame so that we have the dates as the x-axis, and then plot it as a Bar chart.
And.... that a wrap. You just built a Sentiment Analysis Tool for Stock Trading! You can find all the code for this project at our GitHub Repo here. As always, if you have face any troubles building this project, join our discord and TheCodex community can help!
For those of you interested in more project walkthroughs: Every Tuesday, I release a new Python/Data Science Project tutorial. I was honestly just tired of watching webcasted lectures and YouTube videos of instructors droning on with robotic voices teaching pure theory, so I started recording my own fun and practical projects. Next Tuesday, I'll be releasing a tutorial on how to build a Speech recognition tool with Python and Flask
Hey! I'm Avi - your new Python and data science teacher. I've taught over 500,000 students around the world not just how to code, but how to build real projects. I'm on a mission to help you jumpstart your career by helping you master python and data science. Start your journey on TheCodex here: https://thecodex.me/