Round-ups

Eight tools, datasets and resources from the Open Data Science Conference

May 3, 2019May 3, 2019 by Aleszu Bajak

This week, thousands of data scientists, software developers, machine learning experts and other data-curious gathered in Boston for the ODSC East conference. Between hands-on training sessions and dynamic keynote speeches from people like MIT professor Regina Barzilay and Wired writer Clive Thompson, the recommendations were flying. Storybench attended and jotted a few down.

Satellite imagery dataset

In his talk on computer vision and satellite imagery, Microsoft’s Xiaoyong Zhu highly recommended the xView dataset. He calls it the “largest publicly-available dataset for satellite imagery” which has a high coverage of urban scenarios.

Getting tidy with classification and regression

RStudio’s Max Kuhn, who built R’s caret package, gave a helpful workshop on modeling in R introducing some great packages like parsnip, kknn and recipes. His slides are up on Github.

Insurance against disruption

“Chance are you’ll be blindsided by new technology,” said MIT’s Michael Stonebraker in his hit keynote talk. “If you are sitting on your laurels, chances are your market is going to be eaten by someone else… You have to be able to reinvent yourself.” What to do? “You should all read Clayton Christensen’s book, ‘The innovator’s dilemma.’ Read it once a year.”

Other book recommendation from the conference: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies, Erik Brynjolfsson, Andrew McAfee; Superintelligence: Paths, Dangers, Strategies, Nick Bostrom.

Deep learning from scratch

Seth Weidman, a senior data scientist at Facebook, has the full documentation and presentation of his deep learning from scratch talk on Github. It walks you through the math, diagrams and code to understand deep learning. Cool.

Deep learning for Twitter sentiment analysis

Mathworks’ MATLAB product manager Heather Gorr open-sourced her deep learning model (using a Long Short Term Memory Network) that classifies tweets as positive or negative.

DON’T MISS Five ways organizations are visualizing carbon emissions

Two deep learning basics resources

I saw these resources over and over again: fast.ai and deeplearning.ai.

Generating unicorns in the Andes?

Researchers working on automatic text generation at OpenAI are training a “large-scale unsupervised language model which generates coherent paragraphs of text.”

In other words, they supply this text:

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

and get:

The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.
Dr. Jorge Perez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Perez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow…

Author
Recent Posts

Aleszu Bajak

Aleszu Bajak was the founding editor of Storybench. He is currently the director of data visualization at the Urban Institute. Previously, he was a senior data reporter on USA TODAY's data team, part of the newspaper's national investigative unit. He is a former Knight Science Journalism Fellow at M.I.T., was a founding senior writer at Undark magazine and founding editor of Esquire Classic, a project resuscitating the magazine's archives. His work has appeared in The New York Times, The Washington Post, M.I.T. Technology Review and Nature.

One thought on “Eight tools, datasets and resources from the Open Data Science Conference”

Bianca Aguglia says:

May 6, 2019 at 6:14 pm

Thank you for this article and the links. I love the diagram about the modeling process.

The part about the unicorns was funny but that doesn’t make AI’s writing good. If a human wrote or spoke like that we’d scratch our heads trying to figure out what in the world he was trying to say. 🙂

Anyway, good article. I love the writing on your site. Clear and smart. No doubt about the humanity of its authors. 🙂

Storybench

Eight tools, datasets and resources from the Open Data Science Conference

Satellite imagery dataset

Getting tidy with classification and regression

Deep learning from scratch

Deep learning for Twitter sentiment analysis

Two deep learning basics resources

Generating unicorns in the Andes?

One thought on “Eight tools, datasets and resources from the Open Data Science Conference”

Leave a Reply Cancel reply

Get the latest from Storybench

Eight tools, datasets and resources from the Open Data Science Conference

Satellite imagery dataset

Getting tidy with classification and regression

Deep learning from scratch

Deep learning for Twitter sentiment analysis

Two deep learning basics resources

Generating unicorns in the Andes?

One thought on “Eight tools, datasets and resources from the Open Data Science Conference”

Leave a Reply Cancel reply

read more

Where Should Data Journalists Be This Spring and Summer?

The Rise of ‘Nano-Newsletters’: Why 500 Subscribers Beats 50,000

Get the latest from Storybench