Insights

Decoding the Debate: Analyzing the text of the Harris-Trump Face-off

From heated exchanges to subtle digs, the Harris-Trump debate on September 10 left plenty to unpack. The debate offered a window into the distinct communication strategies and policy priorities of both candidates.

Here at Storybench, we love a good data-driven debrief. Theatricality and noise aside – let’s dig into what the candidates actually said.

Are you talking to me?

More than one Storybench editor noticed that former President Donald Trump seemed to avoid addressing Vice President Kamala Harris by name. Was this the case? We decided to use text analysis to investigate.

Graphic by Namira Haris, Heather Wang, April Qian

Harris called Trump by his full name 27 times. Other times, she referred to him as “the former president” or “my opponent.” Trump referred to Harris exclusively in the third person “her” – 113 times, to be exact. Alternately, he called her “the worst vice president in the history of our country” twice, and explicitly equated Harris to Biden (who he deemed “the worst president in the history of our country”) twice. Not once did he address her by name. In fact, the only time either candidate spoke the name “Kamala Harris” was when Harris introduced herself to Trump at the beginning of the debate.

Which words did the candidates use more than others?

Taking a more exploratory approach, we looked at what the candidates spoke about the most. The Term-Frequency Inverse Document Frequency (TF-IDF) statistic is often used in text analysis to compare the relative importance of a word within a corpus, or body of text. Without getting overly technical, we’re interested in how frequently a word shows up, relative to how common it is in the overall body of text. Check out Julia Silge and David Robinson’s book, Tidy Text Mining, for more on TF-IDF.

DON’T MISS  How the Tampa Bay Times visualized the racial breakdown of police shootings in Florida

For example, while both candidates used the word “people,” Trump used the word at a higher relative frequency than Harris. The TF-IDF score, indicated through color in the heatmap below, is an indicator of just how often one candidate uses a certain word compared to the occurrence of that word throughout the overall body of text.

Word Trees can help us visualize context

Looking at the relative frequencies of certain words can be illuminating, but words pulled out of context can only get us so far. To take a deeper look at the specific ways each candidate constructed their language, we took their most frequently used words and displayed them in Word Trees.

The Word Tree is a visualization and information-retrieval technique invented by Martin Wattenberg and Fernanda Viégas in 2007. The technique can help us quickly explore bodies of text and is a graphical version of the traditional “keyword-in-context” method. True to its name, the Word Tree takes any given keyword or keywords and finds the most frequently repeated instances of words associated with that keyword. Relative font size represents the number of times a word pair occurs.

Using Jason Davies’ Word Tree tool, we compared how Harris and Trump used the word “people” during the debate.

Harris’ use of the word “people” was most frequently associated with “the dreams of the American people,” while Trump’s most frequent use of the word “people” was in the phrase “these millions and millions of people,” referring to immigrants coming into the United States.

Click on the word trees below to see them in detail.

Investigating Pronouns: we vs. they

During the previous analysis, we removed so-called “stop words.” These are words like “and,” “but,” “is,” “you” and are often removed during text analyses because they can carry less meaning than words like “president” or “country.” Given the former president’s extensive use of the pronoun “her,” however, we decided to add pronouns back into our word-frequency analysis. 

When we add pronouns back in, Harris’ top word, “we,” and Trump’s top word, “they,” highlighting starkly different world views.

But what if we compared Harris’ and Trump’s language to that used in the first debate between Trump and President Joe Biden?  

When we compared the September 10 debate between Harris and Trump with the June 27 debate between Trump and then-presumed candidate Biden, another interesting pattern emerged. During the first debate, both Trump and Biden’s most frequently used word was “he.” Their word choices were strikingly similar, with their top four words — “he,” “we,” “you” and “they” — being identical.

During the second debate, Trump shifted his language, using more targeted language toward Harris, often referring to her with the third-person pronoun “she.” Such a change in language suggests that Trump is modifying his communication strategy to respond to his new opponent. By addressing Harris only through the third person, Trump appears to not only avoid direct confrontation but also dismiss her presence on the debate stage.

Despite this shift, much of Trump’s vocabulary remained consistent with that of the first debate. His continued use of “they,” often in reference to immigrants, created an “us-versus-them” dynamic. Harris’ language, in contrast, appeared to focus more on direct appeals to voters, using words like “we,” “you,” and “let’s.” The word “they” was noticeably absent from her top ten most frequent words. 

Now, you might be thinking, that’s enough overanalyzing of word choice. Fair enough. Let’s look at the substance of what the candidates said.

We ran an analysis of bigrams and trigrams – strings of two to three words for a comprehensive, unbiased tally of what the candidates said. As in our previous analysis, we removed stop words, so some of the results in the table below might look nonsensical. For context, we also included a list of frequently used phrases from both candidates, this time including stop words. We annotated terms that seemed to be direct references to policies and issues, as well as phrases that could potentially include indirect references to policy issues. Pink annotations are specific policy references, and orange stickers are implied references, with different shapes denoting different potential topics. 

These annotations made it apparent that Harris made many more direct references to specific policy than did Trump – it seems plausible that this is in response to calls for her to be more specific on policy stances. Trump made many indirect references to immigration, many times lamenting the “millions and millions of people pouring into our country.” Perhaps counterintuitively, he uttered the word “immigration” just once during the debate, though he did use the phrases “migrant crime” and “illegal immigrants” once each. In response to a question from the moderators on abortion, Trump said, “we’ll talk about immigration later.”

Because we love annotating debate transcripts, we performed the same analysis on the first presidential debate between Trump and then-presumed candidate Biden. Comparing Trump’s debate with Harris with his first debate against Biden three months earlier, the current Trump seems much more concerned with immigrants coming into the country, lamenting the “millions and millions coming into our country” over 11 times during the 90-minute debate. He focused less on previous policy priorities such as social security and inflation: inflation received a cursory mention (“I had no inflation, virtually no inflation, they had the highest inflation, perhaps in the history of our country) and social security saw virtually zero airtime from Trump this time around.

Closing arguments

Mining the text of the two presidential debates this year with a fine-toothed comb provided us with a treasure trove of insights – some expected, some surprising. Did you glean other insights as you analyzed the text with us? Or do you have ideas for other analyses we could perform? Reach out to us on Instagram, X, or Linkedin, and follow Storybench on social media to keep up with more in-depth data-driven analysis on the election.

Methodology

The code we used for text analysis can be found here. Our primary text sources were ABC News’ transcript of the Sept. 10, 2024 presidential debate between Harris and Trump and CNN’s transcript of the June 27, 2024 debate between Biden and Trump. Tutorials from Melanie Walsh helped provide the backbone for the code, and this analysis would not have been possible without the expertise and help of Lawrence Evalyn, text mining specialist at Northeastern University Library. His research centers on digital infrastructure and eighteenth-century women’s writing.

April Qian

Leave a Reply

Your email address will not be published. Required fields are marked *

Get the latest from Storybench

Keep up with tutorials, behind-the-scenes interviews and more.