Memetic Warfare 201: Using Public Data to Determine which Candidate is Winning Reddit
With the primaries behind us and national conventions around the corner, people around the world are sizing up the presumptive nominees for the 2016 Presidential Election. Between the news, the polls, and everybody screaming on social media, it seems safe to say the General Election is upon us. I was originally going to pull together some polling data to make my own predictions before I remembered polling data this far out from the election isn't very useful. Besides, we have enough wannabe pundits running around trying to be the next Nate Silver (maybe not Republican Primary Nate Silver); I feel something different is in order.
As you may have noticed, this year's election has been considerably different from any in recent memory. The last four candidates standing are:
- A former reality TV star who systematically dismantled the republican party en route to securing the Republican nomination;
- A religious senator who has figuratively fallen off the face of the earth since his campaign was very possibly meme-d to death;
- A democratic front-runner viewed as heir to the throne who recently avoided an FBI indictment;
- A socialist democratic runner-up bemoaning money in politics while smashing fundraising records.
Even more fascinating than the candidates themselves is how they got where they are today. I am convinced that without social media and various online communities, two out of these four candidates would never have contended in their respective primaries. From my perspective, the cults of personality on Facebook, Twitter, and Reddit have a far greater impact than traditional news sites. I find it strange that people get emotional over the occasional Daily Kos hit piece while ignoring the incredible impact the 9th most popular website in the United States* is having. Well, I have not been ignoring it. Today's study is about election coverage on Reddit; let's roll up our sleeves and get our hands dirty!
Enter Google Big Query
Several weeks ago my friend and colleague Madds was helping me trawl publicly available data sources to help satiate my analytic cravings. It turns out that Google's Big Query service has a large number of public data sets available for users to query at will. One data set immediately caught my attention; a month-by-month extract of Reddit posting data. It turns out their analytic platform had all of Reddit's posting data from January to May 2016 neatly packaged into structured tables, ready to be queried.
So I queried them...a lot.
Daddy, What Did You Do During the Meme War?
The first burning question I had was a simple one: which candidate has the biggest fan club on Reddit? If you are even a casual Reddit user, you've almost certainly heard of /r/SandersForPresident(S4P) and /r/The_Donald. These are the online communities I mentioned in the previous segment that effectively boosted their anti-establishment candidates into the stratosphere.
It's tough to say which community had a more significant impact on their candidate. While both subreddits do a great job of rallying supporters, the subreddits had different approaches to community outreach. While S4P organized fundraising and phonebanking sessions, The_Donald took a much... simpler approach to promoting their candidate. They sh...ucks-posted a lot. What's that? You want to know just how much they "shucks-posted"?
Political Subreddit Posting Metrics for May 2016
|SUBREDDIT||TOTAL POSTS||POSTS OVER 1000 KARMA||TOTAL COMMENTS||TOTAL KARMA|
Not only did The_Donald's post count tower above other politically-oriented subreddits, they also managed to push an incredible 4,310 posts above 1,000 karma, which at the time virtually guaranteed the post would be shown on /r/all. Obviously, having a community this effective at promoting your brand is an asset. Even if you don't agree with denizens on The_Donald, you have to admit their ability to Trump every other political subreddit is impressive.
You're probably thinking: "Okay Weems, The_Donald is probably just a really loud echo chamber that most people outside of hardcore Redditors won't read", and for the most part, you're right. However, the most popular content on The_Donald will consistently make its way onto other social media platforms. Reddit content spreading to other parts of the internet is such a common occurrence that there's an entire culture of memes surrounding it. So even if the majority of voters don't go to The_Donald, The_Donald will find its way to them. This same phenomenon is the reason Bernie Sanders news was being plastered all over your Facebook feed even though most of us assumed Clinton would be the democratic nominee years ago. In fact, Bernie's Reddit fan club was so successful that they managed to essentially take over /r/politics, which brings us to my second observation.
The Biggest Battleground State is The Internet
After establishing the reach of each candidate's fan club, I decided to dig into where the meat of the political discussion was taking place. Specifically, I wanted to establish exactly how biased this supposedly neutral subreddit had become. So I fired off another query and pulled in all posts with the words "Donald", "Trump", "Hillary", "Clinton", "Bernie", or "Sanders" in the title and a net karma score over 1000. The theory being that if most people don't troll /r/politics/new, these are the posts casual observers are most likely to see. After pulling down all 504 posts, I analyzed each and every headline.
The method behind the analysis is pretty simple; I read the headline and attempt to determine if the post contains promotional, attacking, neutral, or mixed messages towards the candidate. I did my best to analyze from the perspective of a neutral observer with minimal knowledge of U.S. politics. I intentionally ignored all context in the posts because let's be honest: most people only read headlines anyway.
The rules I used to determine a category for each headline are as follows:
- If a title associates negative actions, events, or characteristics with a candidate, the post is attacking.
- Example: We Now Know Hillary Lied Multiple Times About Her Email Server
- Translation: Hillary Clinton Lied (attacking)
- Example: Panama Papers Leak: Donald Trump, Vladimir Putin and Others on the List
- Explanation: Donald Trump is linked with an information leak (attacking)
- If a title associates positive actions, events, or characteristics with a candidate, the post is promotional.
- Example: Oregon Poll: Trump Leads Clinton in new General Election Poll
- Translation: Trump Leads (promotional), Clinton (neutral)
- Example: 20,000 people attend Bernie Sanders rally in Oakland
- Translation: Many people supported Bernie Sanders (promotional)
- If the wording is not obviously Promotional or Attacking, the article is marked as Neutral.
- I acknowledge this category has the most potential for bias, because many are open to interpretation. I assume most will have a negative lean.
- Example: Interviews of Clinton aides in email case to begin this week
- Example: Sanders Says "I Look Forward to Debating Trump in California"
- If the candidate is associated with actions, events, or characteristics open to interpretation by the reader, it is considered Mixed.
- Example: Hillary Clinton Holds $100,000-a-Head Fundraisers
- Explanation: Supporters see a candidate raising large sums of money, detractors see an example of too much money in politics
- If the candidate is not in the title, they are considered Ignored.
- A candidate may be implied in a title, but for the sake of consistency these are Ignored.
Results of Reddit Headline Analysis
|CANDIDATE||POST INTENT||TOTAL POSTS||TOTAL COMMENTS||TOTAL KARMA|
The results of my analysis for the month of May ended up much as I expected. /r/politics voters think Bernie Sanders is a saint, Hillary is the next Hitler, and Trump is starting to come into the fold as the Republican Primary wraps up. Somewhat alarming is the fact that no Attacking posts against Bernie or Promotional posts for Hillary managed to crack 1,000 net votes. I expected at least some Bernie Attacking posts to make it past the threshold, but The Bernie Brigade seems to have /r/politics on lockdown. Although The_Donald put up some YUUUUUUGE numbers from a posting standpoint, S4P's ability to spur on their own impressive traffic while controlling the most visible political Subreddit with Orwellian efficiency leads me to declare Bernie Sanders the Reddit winner in the month of May. Unfortunately for Mr. Sanders, upvotes don't impact super-delegates, so I expect this may be the last bit of good news his campaign hears.
Introducing The Memetic Index
Your next question is probably, "Why are you breaking down 1-2 month old data?" After all, we're more interested in what is going to happen than what has already happened. The main reason for the data lag is that May is the most current information I presently have access to. Should another data source present itself, or if I am able to develop a solution to pull the real-time data from the Reddit API, I'll publish my findings in another post. Unfortunately, between my full-time job, friends, and loved ones I do not have time to develop this solution in the near future. But I do have time to build something else!
Even though this information is a bit old, I think there are a lot of interesting insights to be gained from analyzing Reddit data like this. Using this Big Query data, I have begun developing The Questionably Qualified Memetic Index. Over the next several weeks I will be replicating the analyses done for this article across all available 2016 data. Using this information, I will look for parallels between Reddit data and polling data and place everything in an easy-to-use dashboard.
Many of us on the QQ team believe there is a seismic shift happening under the political landscape. Individual users have unprecedented online reach to push a political agenda, and as technology continues to develop this reach will only increase. I welcome each and every one of you on my journey into political posting madness on the Questionably Qualified Subreddit. There I will be posting detailed information on how I pulled my data and my development plans for the future.
Until next time, happy posting everyone.
*Ranking as of 2016/07/05