AI Tutorials

Does your next data story need a social media bot? Here’s how to build one.

The age of AI chatbots is upon us, but news innovators have been experimenting with the idea of chatting to get your news for decades. Social media bots, accounts that post automatically, are an intriguing option to explore in an era of vibe-coding and social media fragmentation. Combining my interests in global soccer and data journalism, I decided to resurrect an old project and make a bot for Bluesky. I was left impressed that building a prototype with Claude Code is something a non-coder would do, but getting to a deployable product took a fair amount of my “computer scientist” knowledge to pull off. In this post, I’m going to show you what it takes to build a bot.

Of course, more important than understanding how to build the bot is reflecting on why you’d want to: can a social media bot help your next story connect with new readers in a novel way? Can story-specific bots bring data journalism to potential readers who otherwise might not be connecting to your publication? 

Meet OurCupBot, a helpful Bluesky bot that suggests upcoming World Cup games for you to watch based on immigrant populations around where you live. Mention @ourcup.info in a post alongside a zipcode, and it will get back to you in a minute or two with top picks.

Screen shot of chatbot
An example of a short interaction with the OurCup bot. It recommended upcoming games with Haiti, Colombia and Brazil. There’s also a link to hear some traditional Haitian music.

What’s old is new again

OurCupBot is a bot of a specific type: software that listens for mentions on social media and automatically responds. Bots like this from the earlier Twitter era were some combination of informative, critical, and just plain silly. They served civic, entertainment, and news functions: @NYPDBot tweeted headlines about the police dept from Google News, @ElizaRBarr tried to act as a third party upstander in online GamerGate arguments, @infinite_scream echoed the screams of random users. I found a wonderful academic paper from 2025 that jaunts down memory lane with more examples and ruminates on how bots can help create “accessible, sociable, and joyful communication.” That sounds like the right goal to me!

The recent growth of alternative social media (Bluesky, Mastodon, Threads) has sparked new opportunities to see what role bots can play in meeting potential news audiences where they are. Recent Bluesky bots include one that tracks Elon Musk’s private jet flights (@elonjet.net from Jack Sweeney), or one that shows you newly news articles with interactive elements (@interactives.bsky.social from Sam Morris).

A World Cup twist

At its core, my OurCupBot is an experimental story responding to the hype cycle around the World Cup. The scale of global attention pointed at the men’s tournament is hard to over-estimate; during the tournament every story is an opportunity to ride the wave through a connection. I’ve always experienced global soccer as a way to connect to neighbors across cultural divides. This feels especially important in the U.S. right now, where any given day could bring me an email from a grad student struggling to stay in the country, a Signal message on ICE operations near my kids’ schools, or a data story about the impact of new visa changes. 

The World Cup creates an opportunity to more intentionally explore what solidarity means during times of growing violence and propaganda against immigration. I realized this motivation of mine actually fit well with the short, focused interactions a bot allowed for. Mention @ourcup.info in a post along with a US zip code and it will reply with recommendations for the upcoming World Cup games you should make sure to watch (based on who lives near you). It’ll also link you to music, local food, and other info about the countries immigrants near you are from. Sports, food, music — these are ways we interact with our immigrant neighbors every day. 

First steps: Building a static census dataset

At its core, the OurCup bot is a simple lookup on U.S. Census Foreign Born Population data; specifically the ACS 2024 5-year estimates for dataset B05006. You send in a zip code and the bot looks up the largest foreign born populations in your county filtered by countries playing in the World Cup. This is a classic “see yourself in the map” type of data story invitation, allowing readers to identify relevant data for where they live amidst a larger dataset. This kind of story is a strong candidate for pre-processing all the data, since I know there are exactly 41,554 possible inputs! Statically computing all the data in advance also means there is no live server backend needed, significantly reducing costs and deployment effort.

DON’T MISS  How to build your own headline generator

However, like any geography project, the world of international standards reared its ugly head to make things more challenging. Here’s a few things I had to wrestle with:

  • World Cup teams aren’t always representing full countries. For example, England, Scotland, and Wales all have individual teams even though geographically they are all part of the United Kingdom.
  • FIFA has its own three-letter country codes and they don’t always match the ISO standard. For example, FIFA lists South Africa as “RSA” while ISO lists it as “ZAF”.
  • The Census foreign-born population data columns don’t have categories for every country or sub-division. Consider the tiny island nation of Cabo Verde, who astounded everyone by qualifying for the World Cup this year. Rather than leaving out recommendations for their games, I substituted the larger “Other Western Africa” area (attribute B05006_120E). See all my substitutions.

Once I made decisions about how to deal with all of these wrinkles, and others, I was able to use a Jupyter notebook to preprocess the foreign born population census into static population counts by county (I decided that tracts and zip-codes were too small). The data heart of this project is the pivot table showing me county (by FIPS code) vs. foreign born population from each country that live there (by FIFA country code).

Vibe-coding a Bluesky bot

Amongst the alternative social media platforms that have sprung up in the wake of Twitter’s breakdown, Bluesky seems to be the one where political and news influencers are migrating to. The company seems to have welcomed bots, even adding a checkbox that labels your account as one.

Screen shot shown bot post on Bluesky with blue robot head icon.
If you mark your account as a bot, Bluesky adds a little robot head icon after your handle.

Bluesky is built on the open AT Protocol specification, an open set of standards for decentralized authenticated publishing across the open web. Their platform-specific documentation on bots is well written and includes demo code. Signing up or a bot-specific authentication key requires no jumping through hurdles or getting permissions. So I took my idea for a bot to Claude to see how complicated it would be to create. That’s my current gen AI tool of choice, mostly because Northeastern (where I work) has a site license.

It is important to mention that I’m not an AI booster here. I’m both skeptical of the hype and angry about the impacts. That said, I engage with the tools regularly to try and be an informed critic in my teaching, modeling how I want my students to form their own opinions by trying out small experiments.

I started the exploration by asking Claude how complicated it would be. The response? It said it’d be less than a page of code! Of course, I didn’t believe it because I consider Claude’s suggestions to be similar to those an overly optimistic first-year developer might make. Next I asked Claude to generate the basic code for me in Python and it left me an empty function I could fill in with the data to respond to. As a trained software developer I despise micro-managing agents to write and tweak code for me (it’s just no fun), so at that point I took it out of Claude and poked at it myself. 

Text prompts and replies from Calude AI.
The initial sketch from Claude of how the bot could work.

Claude gave me a solid sketch of what I needed to do to authenticate, a stub for where the response based on my data could go, and some advice on how to host it. After I wired up my agrregated and filtered Census data as a lookup based on the zipcode, I had Claude write a bunch of unit tests for me. I found this incredibly useful, getting rid of some of the tedious-yet-important tasks of building a software project. 

However, as I dug in more and more issues popped up. The first thing I noticed that it missed was that I needed to convert the game time data from GMT to local time at the zipcode; I had to prompt it to add that.

Another issue appeared related to the metadata I added about links to restaurant recommendations (from Yelp), Spotify playlists (from EveryNoise), and local news (from GlobalVoices) from the countries recommended: the links weren’t clickable in the response posts it made. It turns out that adding links in AT Proto is a bit complicated; you have to add “facets” that decorate the content as a link based on character byte offsets in your post text. This was complicated by the fact that the flag emoji I used are two-characters in unicode, so computing the offset for the facet was a little tricky. This is the kind of issue that a novice developer probably wouldn’t understand when vibe-coding a project. Once I prompted Claude with this information it was able to fix this, but it didn’t find it on its own. 

DON’T MISS  Exploring bike rental behavior using R

Another complicated piece was how to keep track of the two bits of “state” (saving information that persists across time), namely the last time we checked for posts and the session key. The first is important to keep around so that the code could  filter to make sure it wasn’t replying to posts twice; the API query should only return posts that have appeared after that last date. The second was critical to make sure we didn’t hit a rate limit by creating a new session every time the cron ran. Claude helped me explore and compare a few options, ending up on using simple text files in a secret Gist with a GitHub API key as persistent storage. Again, it took some fair amount of experience with software to understand this need and make a decision about the right approach.

The last large stumbling block related to deploying. This is the kind “software engineer” oriented task that non-developers often stumble on thinking through. My inclination on static digital journalism projects is to rely on GitHub Pages and Actions for free hosting and computation power. However, when I asked Claude about this, it noted that frequently recurring actions, such as checking for new mentions against an API every few minutes, are often unreliable. I had Claude write up a GitHub Action for me anyway that ran every five minutes and found that it was right; the action just didn’t run regularly. Here I was able to leverage my knowledge as a software engineer to ask Claude to find the cheapest hosting option for running this kind of super small scale cron job every minute. It led me to Render.com as a suitable host. A few clicks later I had a functioning approach to running the bot script every 2 minutes, for very cheap ($5 per 1,000 minutes of runtime at the “Starter” tier).

Image shows flow chart of coding used to build the bot.
The data architecture underneath the functioning of the bot.

Should you build a social news bot?

Go try it out! Mention @ourcup.info and a zipcode in a Bluesky post and it will respond with recommendations after a minute or two. Read the accompanying scrolly story to try out the interactive version and explore the data in depth. Peruse the my source code and let me know of any bugs I missed on this quick project.

This data bot harks back to an earlier era of journalism on social media, where the experimentation was more focused on form of interaction, not the form of content. I’m not dismissing by any means the power of strong social video content, or personality-driven reporting. Instead, I’d like us to retain the spark to experimentation across both interaction and content. Generative AI might help push some ideas regarding the former into production that otherwise have been left on the floor of the meeting-room room. We can play and build novel forms of interaction, like a story-specific social media bot, much more easily now.

So drop me a mention, watch a game, and let me know if you connect to new food, music, news, or history of your neighbors. Maybe building an “old school” news bot can spark a new creative angle for a story you’re working on?

Rahul Bhargava

Leave a Reply

Your email address will not be published. Required fields are marked *

Get the latest from Storybench

Keep up with tutorials, behind-the-scenes interviews and more.