How to Analyze bluesky Posts and Trends with R

November 4, 2025November 5, 2025 by Lucia Walinchus

If the only things you’re doing on bluesky are scrolling, liking and posting, then you are still riding a bike with training wheels. Hear me out.

There are several simple and free tools out there that let you take advantage of bluesky’s secret weapon: its open-source skeleton.

A how-to

A few firehose ideas:

Pull thousands of posts to look for trends
See how key phrases change over time
See articles others are recommending

First, you’re probably thinking: why spend time doing this? We have this embedded idea that truly great content will always rise to the top. For a detailed discussion on why not, see these. But the short answer is: quality is part of the equation but not the whole story.

Wheel graphic showing content at the core but also user connections, functional connections, and product connections. — From Bharat Anand’s book “The Content Trap.”

You can’t download all two billion posts, but you can download a whole lot more than you ever could scrolling. And you can use that data to look for patterns. For example, what if we took a look at 100,000 posts from the last day?

latest_posts <-
bs_search_posts(
"*",
sort = 'latest',
since = '2025-10-10T00:00:00.000Z', #or whatever time you want the latest_posts. I didn't want to hardcode this but couldn't' figure out a way to put the time in that exact format.
until = NULL,
mentions = NULL,
author = NULL,
lang = NULL,
domain = NULL,
url = NULL,
tag = NULL,
cursor = NULL,
limit = 100000, # You will want to limit this!
user = "walinchus.bsky.social",
pass = Sys.getenv("BSKY_PASS"),
auth = bs_auth("walinchus.bsky.social", Sys.getenv("BSKY_PASS")),
clean = TRUE
)

Analysis idea number one: what comes up most often? Here, we take out the most popular words (prepositions, etc.) and look only at the frequency of other words.

stop_words1 <- as_tibble(stopwords::stopwords("ja", source = "marimo")) %>% rename(word=value)
stop_words2 <- as_tibble(stopwords::stopwords("pt", source="snowball")) %>% rename(word=value)
stop_words <- bind_rows(stop_words,stop_words1,stop_words2)

post_words <- posts %>%
unnest_tokens(word, text) %>%
anti_join(stop_words)

post_words %>%
count(word) %>%
arrange(desc(n)) %>%
slice_head( n=100)
```

Then a table: word
<chr>
n
<int>
people 297
trump 258
time 215
2025 181
prize 178
10 176
day 164
peace 163
love 153
nobel 148

Idea number two: Looking at which words were important in each hour. To do this, we are going to employ a neat little trick called TF-IDF. Check out Julia Silge’s work here for a full explanation but essentially: What words were important in each chunk?

This is really fascinating to see the effects of time zones, and when certain news broke!

Chart showing the TF-IDF of words in certain hours

Another idea: let’s say you want to see not just what people like, but what others are recommending to their friends. For example, this post doesn’t have many likes, but already has a lot of reposts. And so neat! I did not know this about hippos.

DON’T MISS How the Urban Institute democratizes access to education data

This is just scratching the surface. There’s a lot of really granular data here. You can have fun with this! HMU with more ideas!

example of data from the bluesky firehose

The code for this can be found here.

This mostly uses the bskyr #rstats package developed by Christopher Kenny. Check out his vignettes here.

This also looks cool: David Hood shows you can build a big selection of follower/following connections from an initial curated list of related people on his Github.

Disclaimers! I am not employed by bluesky, nor do I know anyone who is. This is not an endorsement and I have not extensively tested how well this works. Please respect the rate limits; this prevents overwhelming the system. And big thanks to Mike Stucka (@mikestucka.bsky.social ) and all the folks at IRE who encouraged me to test this out. I was very against signing up for yet another social media site until I saw what it could do!

Author
Recent Posts

Lucia Walinchus

Lucia Walinchus is an award-winning journalist, attorney and ice hockey player/coach. She is currently the Data Editor for NBC-owned stations. Walinchus has written more than 500 articles for various publications throughout her career and was named a 2016 Fulbright Berlin Capital Program Scholar. She has been featured as a guest speaker on CNN and was a contracted freelancer for the New York Times and the Washington Post. Her work has previously been recognized as the Best Investigative Reporting in Ohio, the Best Data Journalism in Ohio, and the Best Public Service Journalism. Walinchus has a degree in Journalism from American University and a Juris Doctorate from California Western School of Law.

Latest posts by Lucia Walinchus (see all)