Tutorials

How to download song lyrics from Genius using Python

Coding and I are not on the best of terms. After a traumatic three-year stretch in middle school in which I barely scraped by in my computer science classes, I have largely kept my work confined to Excel, with occasional forays into R.

But recently, I came upon a problem that an Excel formula couldn’t solve. I’m currently attempting to analyze hip-hop history through a data lens and, as part of that, I intend to do a lyrical analysis. To do this, I obviously need to collect all the lyrics from all the songs I’m interested in.

I considered the copy-and-paste route, but with over 400 songs in my initial dataset, but that wasn’t feasible. So, reluctantly, I turned to Python. With a lot of help from Steven Braun, a data analysis and visualization expert here at Northeastern, I used a script to take my spreadsheet, search the Genius API for matches, and publish the lyrics as .txt and .json files. *Below I’ve also added a separate script that adds the lyrics into a new column of the spreadsheet. Thanks to Northeastern Media Innovation alum Felippe Rodrigues for help there! 

Here is a step-by-step tutorial on how to do it. And trust me, if I managed it, then you can, too.

Register your API client with Genius

Genius’s API is awesome, but you’ll have to register to use it first. Luckily, they make this super easy. Go to this URL for their sign-up page and follow the prompts. I just used my Facebook login to make life easier.

Create a folder on your desktop to house everything

For the Python script to work correctly, it needs to be able to locate your files and have a place to dump the lyrics. To make this easy, make a new folder in your Desktop. Save your spreadsheet as a .csv in there.

The script will need a place to put all the lyrics it scrapes, so make a folder within the folder called “lyrics.” Within that folder, make two more folders called “txt” and “json.” The script saves lyrics in both formats, and this makes for easy organization.

Download necessary individual scripts

As you’ll see in the next step, the script for this isn’t actually very long. That’s because it calls on a few other separate scripts as well, so you will need to download those first for the main script to actually work. You can go ahead follow this link to do so. but the short of it is “pip install lyricsgenius” (Thanks, johnwmillr for posting this on Github!)

Once downloaded, the folder should be titled “lyricsgenius.” Its contents should look like this:

Save the “lyricsgenius” folder within the folder you created in the last step, so that the script can easily find and access it.

DON’T MISS  A roundup of coronavirus dashboards, datasets and resources

This is also a good time to make sure you have Python 3 installed, as certain elements of the script won’t work without it. You can click here to download it if you don’t have it already.

Copy and paste the following code into the text editor of your choice

With everything now set up properly, we can move on to the actual code. I removed my own client ID from this script, so just copy and paste your own from the Genius API in the line with all the x’s.

Disclaimer: I did not actually write this. It was graciously provided to me by Steven Braun, who agreed to have it shared here.

 
import lyricsgenius as genius
import csv


# Insert client ID
api = genius.Genius('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')

with open('Hip-Hop History Data.csv', encoding='mac_roman') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            line_count += 1
        else:
            item_year = row[0]
            item_artist = row[1]
            item_song = row[2]
            # print(f'\t{row[0]}\t{row[1]}\t{row[2]}')
            song = api.search_song(item_song,item_artist)
            filename = item_song.replace("/","_") + "|" + item_artist.replace("/","_")
            if song is not None:
                song.save_lyrics(filename='lyrics/txt/'+filename+'.json',format_='txt')
                song.save_lyrics(filename='lyrics/json/'+filename+'.json',format_='json')

            line_count += 1

    printf('Processed {line_count} lines.')

Once you’ve copied and pasted it, save it as a .py file in your folder with an easy name. I just used “test.py”. 

Open up your command terminal and set the directory

I used my Mac’s command terminal to run the program, but if you prefer IDLE or another IDE, feel free to use it.

Once that’s booted up, you’ll need to make sure your computer is working within the right place. Using the cd command, set it to the new folder we created a couple steps back. For me, that looked like this:

cd ~/Desktop/hip-hop-lyrics/

Start the script!

Once you’ve set the directory, you are good to go. Start the script by entering the following:

python3 ./test.py

Once you execute that command, the terminal should start scraping lyrics from the API. The screen will display something like this:

If you go into your lyrics subfolder, you’ll notice the .txt and .json files being saved there immediately. With around 400 songs, the script took about 20 minutes to finish.

Check that all songs were scraped, and if necessary, run the script again

As great as the Genius API is, it’s a little finicky. The song titles and artist names in your .csv will need to be pretty much exact matches to the Genius spelling, or the script won’t catch it and it will move along to the next song without saving any lyrics. For example, I had “Jay Z” as my spelling while Genius used “JAY-Z”.  

The way I addressed this was by checking the scraped files against my spreadsheet. I copied and pasted all the file names into the Excel file, and then used a simple matching formula to see which songs were missing. This webpage guided me through how to do that.

DON’T MISS  Exploring climate data using the Python libraries Matplotlib and Pandas

If you’re unlucky like me, a fair amount of songs will be missing.  You will need to go back through the spreadsheet and clean it so the names match the spelling on Genius. You can do this from the beginning, too, but I had made my spreadsheet weeks ago so that wasn’t an option.

Once you’ve cleaned the spreadsheet, run the script again. Make sure to delete all the lyric files from your last run, though, or the program will ask you if you want to overwrite them for every single song. That gets annoying pretty fast.

If your spreadsheet is all good, then you should be set! Go forth and use your new .txt and files .json to learn as much as you want about your new lyrical database.

Bonus: A Python script that adds lyrics to your original spreadsheet

 
import lyricsgenius as genius
import csv

# Insert client ID
api = genius.Genius('xxxxxxxxxxxxxxxxxxxxxxxxx')

# opens file in read mode and creates a reader
with open('hiphophistory.csv', 'r', encoding='mac_roman') as csv_input:
    csv_reader = csv.reader(csv_input, delimiter=',')

    # opens file in write mode and creates a writer
    with open('hiphophistory-output.csv', 'w', encoding='utf-8') as csv_output:
        csv_writer = csv.writer(csv_output, delimiter=',',quotecar='"',quoting=csv.QUOTE_MINIMAL)

        # line counter
        line_count = 0

        # iterate over every line
        for row in csv_reader:
            # read item in row
            item_year = row[0]
            item_artist = row[1]
            item_song = row[2]

            # write header to csv
            if line_count == 0:
                print('Writing csv header')
                csv_writer.writerow([item_year,item_artist,item_song,"lyrics"])

                # add to counter
                line_count += 1

            # write other lines to csv
            else:
                # search api for lyrics
                lyrics = api.search_song(item_song,item_artist)

                # check if lyrics returned a true-ish response
                if lyrics:
                    # True, proceed to writing row with lyrics
                    print('Result found. Writing row {} to csv'.format(line_count))
                    csv_writer.writerow([item_year,item_artist,item_song,lyrics])
                else:
                    # False, proceed to writing row without lyrics
                    print('Result found. Writing row {} to csv'.format(line_count))
                    csv_writer.writerow([item_year,item_artist,item_song,"no results"])

                # add to counter
                line_count += 1

        print('Processed {} lines.'.format(line_count))

Alexander Frandsen

7 thoughts on “How to download song lyrics from Genius using Python

  1. I’m glad you found my wrapper! I’ll take a look at the search specificity problems you mentioned. If you set the “overwrite” tag to True when saving song lyrics, it won’t ask about existing files: song.save_lyrics(overwrite=True). Good luck with your project!

  2. Hi, I’m just finding your work while trying to replicate it slightly. I’m wondering if there is a way to just put all of the lyrics of the songs from one artist into one txt file. Instead of having 1 file for a song per artist, I want to write the lyrics of all the songs into one txt file. How would we go about changing this code to allow that if possible?

  3. Thanks for the excellent tutorial on Genius.com. I was able to download the lyrics but it puts each song in a separate JSON file. Is there a way to download all songs for one artist to one JSON file?

  4. This doesnt make sense. Where do you specify what artist is being extracted? Also, this python isn’t the correct syntax,there is no ”printf”
    This isn’t very well explained. ALSO it doesnt even work, go figure. smh

Leave a Reply to James Moser Cancel reply

Your email address will not be published. Required fields are marked *

Get the latest from Storybench

Keep up with tutorials, behind-the-scenes interviews and more.