Sexism through sentiment analysis: Looking at the media’s coverage of 2020 candidates
The 2020 election cycle has firmly begun. With Bernie Sanders throwing his hat into the ring last week, the Democrats now find themselves with a large (and likely still growing) pool of candidates from which to choose.
But, as promised in our introductory post last week, we aren’t going to worry too much about the horse race. Instead, we’ll be focusing on how the media is covering these candidates. After all, if the media serves as the watchdog of the election, then someone has to keep an eye on the watchdog, too.
For now, our coverage analysis will focus on the six most prominent candidates in the race thus far: Kamala Harris, Bernie Sanders, Amy Klobuchar, Elizabeth Warren, Kirsten Gillibrand, and Cory Booker. There are a few other names who have entered the competition (including a Republican challenger, former Massachusetts governor Bill Weld), but those six seem to have separated themselves for the time being.
Before moving into our first piece of analysis, a quick note on methodology:
This first update is focused on the initial coverage of each candidate’s campaign announcement, a time span that we have set at two weeks. Sixty articles have been collected so far – 10 per candidate. To get a sample of the “mainstream media,” our dataset is limited to the top five most-read American news outlets according to Alexa: CNN, The Washington Post, The New York Times, Fox News, and The Huffington Post. We took two articles from each following a candidate’s announcement and scraped the text into a spreadsheet, which you can find here.
Most of the analysis was done using R’s tidytext package. When discussing sentiment, we used the “bing” lexicon, which sorts descriptive words into a binary category of either being positive or negative.
Picking favorites: Sentiment of each candidate’s coverage
The news media generally strives for objectivity, with the goal being to cover each candidate on equal terms. Yet while that is the stated mission, the word choice of journalists can betray an underlying, perhaps even subconscious, attitude towards certain candidates. But when this is brought up in popular discourse, most judgements seem to be based on qualitative observations—President Trump’s repeated assault on the “crooked media” comes to mind.
To see if the news media is showing any bias to candidates already, we ran our dataset through the “bing” lexicon. Each descriptive word was classified as either positive or negative and, at the end, the total number of negative words was subtracted from the total number of positive ones to show net sentiment. The result, shown below, is not perfect by any means. But it is useful for getting a rough sense of how candidates are being talked about in the media.
With the limited nature of our dataset thus far, we won’t jump to any rash conclusions. But it’s impossible to ignore the trend above. The two male Democratic candidates top the charts in terms of media sentiment, and the four women lag behind. Gillibrand, who has most closely associated her campaign with gender, is the only one with a negative overall sentiment.
During and after the 2016 cycle, much was made of possible sexism in the coverage of Hillary Clinton. Are the same mistakes being repeated again? It is too early to tell conclusively, but Dan Kennedy, a journalism professor and nationally-known media commentator, firmly believes that there is a gender bias in political coverage.
“The two male Democratic candidates top the charts in terms of media sentiment, and the four women lag behind”
“Is there sexism in the way the media covers the candidates? Yes, yes, yes,” Kennedy wrote by email. Pointing to 2016, he mentioned that it’s possible this bias results less from personal opinions and more so from a lack of critical analysis.
“Maybe a better way of putting it would be that there was horrendous sexism in the way Trump went after Clinton, and the media covered [his] remarks with mindless regularity.”
The difference between Sanders and Warren in the graphic above serves as a valuable comparison point. In many ways, they share much of the same lane, with both in favor of anti-corruption measures and massive changes to the financial system. Both also generally let their policy do the talking, while fending off questions about their ability to play nicely with others.
Yet while Sanders enjoys some of the highest favorability ratings among Democratic politicians, Warren gets slammed with headlines like this one from Boston Magazine: “Why is Elizabeth Warren So Hard to Love?” Many, including The New York Times and NBC, have pointed to sexism as the culprit for this discrepancy, and this chart would seem to serve as further evidence.
It’s also worth noting how well these sentiment numbers line up with current polling. RealClearPolitics has been maintaining a composite poll, and of the candidates we’re looking at here, Sanders pulls in the most support at nearly 17 percent. Harris trails with 10 percent, followed by Warren at 7.5 percent, Booker at 5 percent, and Klobuchar at 3.7 percent.
Gillibrand, like in the media sentiment rankings, finishes in last with barely one percent. The spread here isn’t big enough to make many observations about the rest of the field, but it’s clear that with the media portraying her in a negative light and painfully low polling numbers, Gillibrand could find herself out of the race sooner rather than later.
Which words are associated with each candidate?
Another core criticism of the 2016 coverage was that the media fixated on certain issues while ignoring opportunities for more nuanced and substantial discussion. The most noticeable obsession was with Clinton’s emails, but there are plenty of examples to choose from.
For Kennedy, this reflects a fairly fundamental flaw in American news media: “We tend to do a lousy job of covering politics—it’s mostly horse race and fake controversies,” he said. “What’s really lacking is solid reporting on the candidates’ qualifications, experience, life stories, leadership ability, and temperament—in other words, what kind of president they would be.”
To see if this flaw has reared its head in the initial election coverage, we used a method called tf-idf to sort out the most used unique terms for each candidate. This sorts out commonly used but shared terms like “democratic,” and identifies the words most closely associated with each person.
Warren, clearly, is plagued by the Native American ancestry controversy. The first four words all relate to her supposed Cherokee heritage (or, to be more accurate, lack thereof), and Trump’s “Pocahontas” moniker sneaks in there as well, just below “boston.”
“[T]he Pocahontas stuff is terrible,” said Kennedy. “The media need to find a way to stop rising to Trump’s bait…cable news drives so much of the campaign narrative, and it is uniformly awful.”
This fixation seems to be especially tied the cable news and social bubble. This article from The New York Times did an excellent job outlining how voters seem to mostly not care about the whole hubbub, despite what Twitter would lead you to believe. Will that change, though, if the media keeps beating this drum?
The words associated with Booker are mostly notable for their general boringness. The New Jersey senator is probably pleased with how closely he’s been tied to his Newark political roots, and his work on schools has been consistently highlighted. From a qualitative point of view, Booker’s coverage has mostly been unremarkable and fairly positive, so this feels right.
Like Warren, Gillibrand’s coverage has mostly focused on controversies she has become entangled in. With “franken” leading the way, it is clear that her role in pushing for the former senator’s resignation is being highlighted repeatedly. That was a formative moment in the growth of Gillibrand’s national profile, so to an extent it is understandable. But the fact that another man’s downfall is a defining theme of her coverage thus far is troubling.
Gillibrand has also been accused of being a bit of a political chameleon, shifting her views in order to gain popular support. Her attitude towards gun rights has been spotlighted here, with her relaxed attitude in the past coming into question.
Klobuchar has famously embraced the concept of “Minnesota nice,” and from the look of things, she has at least gotten the first part of that phrase to stick. The “nice” part, though, may be a little harder to come by. Her reputation as a bully boss has been discussed thoroughly in recent days, and although much of that coverage began to emerge after our article collection, you can see a glimpse of it with the inclusion of “binder” at the very bottom, which is apparently her projectile of choice. For our next update, “comb” may very well find itself on the list.
“Socialist” is the term most associated with Sanders, which is probably the least surprising finding in the whole candidate pool. Sanders has gained much of his popularity (and notoriety) thanks to his support of socialist economic practices, and the media has made that fact clearly known. But Sanders isn’t actually a total socialist, and opponents have already shown that they’re willing to seize on that label to take him down.
“Xenophobe,” “pathological,” and “liar” certainly raise some eyebrows on first glance, but in actuality, they’re evidence of his extremely consistent messaging; Sanders has seized onto those adjectives to describe President Trump, and has employed them repeatedly.
Having read through all 60 of the articles in our dataset, it appears that the media believe Harris to be the most formidable candidate thus far. But there have certainly been critiques of the California senator, many of which have focused on her time as a prosecutor in the Bay area. In an era in which criminal justice reform is gaining serious momentum, she will have to address that history.
On the bright side, her platform seems to be gaining traction. Her bail reform push is highlighted, with “bail” being the fourth most used unique term, and “renters” shows her support of a renters tax break.
- How are Iowa and New Hampshire outlets covering 2020? - April 8, 2019
- Women on the 2020 campaign trail are being treated more negatively by the media - March 29, 2019
- Sexism through sentiment analysis: Looking at the media’s coverage of 2020 candidates - February 28, 2019
How did you select those 10 articles per candidate? Without formal criteria this could have a strong influence on the outcomes.
Hey Victor, for the 7 candidates, we’ve selected 26 articles from each of these outlets: CNN, The Washington Post, The New York Times, Fox News, and The Huffington Post. Here’s our dataset: https://docs.google.com/spreadsheets/d/1AkYoTeSK4-RvoTWJQZP-EdPTkGZ2GPCbef6M4WOKe3o/edit?usp=sharing
Thanks for releasing my question and answering it.
My question was not which articles were selected, but how you selected them. With bad faith in selecting articles you could get almost any desired answer, but also without it the result can quickly be biased.