NodeXL’s Marc Smith on how mapping virtual crowd networks is relevant to journalism
As social scientists try to understand the online communities we form, journalists are missing out on stories that network theory could help unravel, according to Marc Smith, a sociologist and director of the Social Media Research Foundation, who helped build the mapping tool NodeXL.
Storybench sat down with Smith to chat about network theory, “betweenness centrality,” the structure of audiences and communities, and how mapping virtual crowds can help advance journalism.
You are a sociologist studying social media. Tell us about the networks you see online.
Virtual crowds clearly now matter for democracy. Legitimacy comes from democratic activity. When you claim majorities, you have legitimacy. Well, social media platforms built a series of legitimating platforms, and then made it easy to, subversively, access those positions. It’s easy to become a trend.
There’s a few mechanisms on social media platforms for claiming a certain thing is trending, is viral, or is a topic that’s at the top of some list. And a lot of people have learned the tools needed to manipulate that.
What was the thing 4chan did to Time Magazine? Person of the Year, right? Five or six years ago, 4chan decided “Marble Cake for the win.” And, so, they manipulate the results of the Time Person of the Year Award. No one took note of that and worried that all the other top lists would also be manipulable?
And what is the role of journalism in all this?
Too often we hear in the media “and then things went viral.” Today, right now, you can fairly easily pick the locks of journalism through the manipulation of social media.
In fact, some of the far-right-wing folks who have had such success have been interviewed. They report that they know how to make something trend on Twitter. They have 400 people in a chatroom and they all say “hashtagX is today’s hashtag.” They all begin to tweet hashtagX. Twitter’s algorithm responds, and now it trends.
“Right now, you can fairly easily pick the locks of journalism through the manipulation of social media.”
This is the first step of a pathological process in the body journalistic. Reporters tend to look at Twitter. They see a trend and, now, they have a news hook.
The issue is why did Twitter present trending as if it was some enormous signal of legitimacy? Journalism and the media broadly accepted it as such because it was convenient. Most of the time it seemed legitimate, but it was undermined. It is no longer a costly signal. It’s actually a cheap signal.
Why is looking at networks important for journalism?
Network maps of social media make it clear that popularity and virality are two different things that sometimes overlap, but don’t have to. Virality is when information comes from multiple sources. Popularity is when information comes from a single source but is widely consumed.
We have this process of deciding what’s newsworthy, and a lot of it is “Are you the only crackpot who thinks that or do you have other crackpots with you?” But how many other crackpots does it take to become newsworthy? What we need is better education for editors to say “A trend doesn’t mean anything. Give me a news hook.” A trend is a debased sort of social signal.
And part of the reason we are in this pickle is because the tools are not available to journalists. We want to outfit journalists with the digital camera that you can go out into cyberspace and take pictures with, such that never again will there be written in a newspaper or on any kind of news source “and then things went viral in cyberspace,” as if that was the news hook.
What is NodeXL and what stories can it help reveal?
NodeXL is a piece of software for network overview, discovery and exploration add-in for Excel. It’s a way of taking pictures of connected things, things that are connected to other things. Collections of connections or networks.
I like to use the metaphor that we’re talking about a kind of photography for digital media, a way of taking a photograph of a virtual crowd. If you have a picture, it can tell a story that you would otherwise have a very hard time capturing. Imagine journalists running around with colored pencils and pads trying to quickly sketch, get pictures of scenes on-the-go. Difficult.
Compare that to every journalist whips out an iPhone and start shooting pictures. And those pictures tell stories. That’s what I want. I want every journalist to essentially have another kind of tool that they can whip out quickly and start shooting pictures, but these are virtual photos. We’re going to know something about these virtual collectivities. To understand them, we have to map them.
I would like to believe that a journalist who has just been told “Go do a story on X,” most of the time, that story would have a social media component now. When they go do the reporting on the social media component, they should bring a digital camera, a virtual camera. That’s certainly going to make their reporting easier to immediately figure out who the factions are, who the leaders of those factions are, and what web resources those factions are linking to – what are their hashtags. In ten maps, you can put them up on the wall, you can see the landscape of that topic. We can map them, and telling a story of that map then is a lot easier than reading 10 thousand tweets.
Tell me about the concept of centrality in identifying the factions, leaders and other relevant people in virtual crowds.
We make it very easy to see who is already at the center of this network. Not the person who tweets the most, not the one who retweets the most, or gets all these things that are volume. We don’t care about that. What we care about is, in a set of collections somebody’s got a set of connections to them that are unique, special. Those people have a property in network theory called centrality. And when we calculate it, we find these highly-central people. We call those people the mayors of a topic.
“We look for those people because they’re the bridges.”
There are flavors of centrality, and, by that, I mean there are just different mathematical algorithms calculating a number that tries to represent the in-the-middleness of something—and people disagree on what means to be in the middle of something. We have one definition that is a really easy one. The person with the most connections is in the middle. We call it degree centrality. Another word for it is popularity.
But we have another definition of centrality that is called “betweenness” centrality, and it doesn’t count how many connections one has. It counts how unique his/her connections are.
We look for those people because they’re the bridges. They have high “betweenness” in the network. They’re the slender threads that sometimes connect to otherwise separate groups. Sometimes those groups hate each other, sometimes not, but bridges are really important.
What can the network’s structure tell us about the crowd?
With Pew a couple of years ago, we wrote a report that showed a limited number of network structures. We found six that are extremely different patterns of connections.
One of those patterns is an indicator of political division, the “polarized crowd,” when you find two clusters with no connection between them.
Another two are the “tight crowd” or the “community.” Instead of there being two bubbles with dense connections within them, there’s only one. There’s no “the other people,” the alt-group.
There is also the “brand cluster.” It’s kind of like confetti, they connect to nobody, nobody connects to them. There’s almost no connection. People are talking about a brand, about some public event, and they might say something like “I bought an iPhone,” but not “@so-and-so I bought an iPhone,” which means it’s not part of a conversation. It’s just brand name.
Those four patterns are distinct from the final two. The final two are the ones where there’s one account, and all the other accounts connect to it. And there are two ways of doing that. There is connecting inward and connecting outward. The inward pattern is what journalists look like in social network maps. There’s this big ring of people pointing at him/her saying “retweet.” If you can find really dense cores who have retweeted the journalist, but also replied and retweeted each other, now you’re talking about a community growing around your content, brand, persona.
We also believe there is a way for us to measure not just that we are polarized, but how much are we polarized. And if we were ever going to become less polarized, how would we know and how would we encourage things that reduce polarization?
With better network tools, I think journalists can probably help even the playing field. Right now, people have learned the secret that “small can be seem large if we’re loud enough,” it is spreading and being used, and we’re all still being fooled by it. What we need are the tools to get aerial photographs of these crowds so that you could say “Look, it’s the small, loud crowd over in the corner. That’s where this message is coming from.” It doesn’t matter what side of the aisle you’re on, we just need to have a more empirical, quantifiable, more objective view of social media.
- SXSW: ‘Excel is okay’ and other tweet-size insights for data journalists and news nerds - March 17, 2018
- NICAR: Data stories from last year that you could be doing in your newsroom - March 13, 2018
- How to scrape Reddit with Python - March 12, 2018