To the moon!
For this week's post we want to share a data analysis we ran over Reddit's Wall Street Bets (WSB) group comparing GameStop ($GME) against other stocks using, yes you named it, Artificial Intelligence. In past posts we've focused mostly on Data Analysis and Visualization, this post will make use of a pre-trained AI model to analyze text in Reddit's comments. You can read this post and modify it directly in Hal9 by opening this pipeline.
GAMESTOP IN WALL STREET BETS
This post makes use of data import, transformations, prediction and visualization to compare GameStop ($GME) against other stocks discussed in the Wall Street Bets (WSB) group.
We will find out that users are way more engaged in GameStop discussions when the post has a negative sentiment compared to discussions about other stocks and topics. You can then read how we built this AI pipeline with Hal9.
Analyzing WSB with Hal9
To analyze GME in WSB, we will first start by importing our dataset from Kaggle; specifically, we will be using the Reddit WallStreetBets Posts dataset. This dataset can be downloaded as a 40MB CSV file and imported into Hal9 with a CSV block. However, to simplify this post, we first used Hal9 to import this dataset, we then filtered the dataset to contain only the posts with most upvotes using a "Filter" block, selected a subset of the columns with a "Select" block, and saved this new dataset with a "CSV" export block. This smaller dataset looks as follows:
The next step is to split the posts into GameStop posts and Other posts, which we can achieve with a "Map" block by adding a new "stock" column that is set to "GME" when the post contains "GME" in the title or "Other" when it doesn't. We can then add a "Scatter" plot and see if we find any peculiarities when we plot "Number of Comments" against "Total Upvotes". The following charts shows that there seem to be two types of posts in WSB, some with a lot of comments and upvotes proportional to comments; the other ones with fewer-comments and many more upvotes. Our personal insight is that those might be posts with memes that, well, deserve an upvote with no questions asked.
This is quite interesting but doesn't tell us much about how GME is different from other stocks. To find out, we will have to make use of a pre-trained TensorFlow.js model to compute how positive or negative a piece of text is, this procedure is usually known as "Sentiment Analysis", which we can easily perform in Hal9 by dragging the "Sentiment" block into the pipeline and selecting the "Title" field as the field to analyze. Specifically, this model uses a Convolutional Neural Network (CNN), but we don't need to worry about understanding those yet -- think of this model as a mathematical model that knows how to score text as positive or negative, that's about it.
Once we have the sentiment, we create another scatter plot using the sentiment in the x-axis, the total upvotes in the y-axis, and size of the bubble for the total number of comments. We also modified a little bit the code behind the scatter plot changing the opacity to avoid hiding data. From this visualization, it's clear that there are bigger bubbles for GME posts in both, negative and positive posts, this is not the case for other stocks where negative comments seem more likely to get ignored. One theory here is that the WSB community is more willing to engage and defend negative posts against GME, but that's perhaps something we can explore in a different post!
Hal9’s Interface provides you with various different types of charts, transformations and ready-to-use AI models to analyze data with ease.
If you are interested in using AI models in your data analysis, please give hal9.ai a try and let us know what you think. If you’re ready for a bigger challenge, you can create entirely new data sources, transformations, visualizations or predictive models, and contribute them to our open source GitHub repository.
We also have a Twitter Hal9 account, worth following to learn more about Artificial Intelligence, visualizations, and data analysis.