Music Lyrics with Python: My Latest Project

About a week ago, I was sitting at work and thinking about various things. My current job allows me to have about an hour or so of thinking while completing my work-related duties, which helps a lot with developing new ideas for passion projects. I was listening to a certain song that is currently in my head on repeat – Serotonin by Sora – and I decided I’d like to learn the lyrics. A lot of people, myself included, enjoy singing along to music they enjoy, as it gives them a hit of dopamine. Thus, I started searching the web for the lyrics…and found nothing. Nothing on Youtube, Genius, or anywhere else. I figured, since this is a relatively underground song, it’s a given that there would be nothing available. However, I still wanted the lyrics, and I didn’t want to pay some website $10 to provide the lyrics for me, nor did I want to spend hours listening to the song in 5 second intervals, transcribing the lyrics (or what I manage to make out of them), and repeating the process for the whole 3 minutes–it would kill my enjoyment for the song and make the entire process useless.

I then thought some more…and realized, “I know python and have a general idea of how to do this, why don’t I just make an app that does it for me?” And so I did. This post will take you through the thought processes that made up the app, from the beginning–where I had the initial idea, the middle–where I encountered topics and libraries that I never knew existed, and the end–where I might continue developing the app further and possibly posting it on this website, as well as what I’ve learned.

The goal of this project was straightforward and split the work into 3 segmented parts: 1) create a script that can download YouTube videos (since it’s the easiest to work with as an MVP), 2) convert them into audio files (mp3), 3) then transcribe them into text. Since there were only 3 parts, it was relatively simple, but with any app, I inevitably faced challenges as I progressed through development, some of which were unbelievably stupid.

Downloading the videos

Since I worked primarily with Discord bots, venturing into videos was new territory for me. To begin with, I searched for an API that has the ability to download YouTube videos. There were 3 options for me to work with–Google’s official API, and two wrappers called “pytube” and “youtube_dl” respectively. Google’s API was clearly the most extensive and direct way to work with anything relating to Google, but it requires you to create and store an API key and deal with the nightmare that is their developer docs. For such a simple project, I figured it wouldn’t be worth the hassle to work with their API, but that might change based on the project’s needs. "youtube_dl” and “pytube” were a coin-flip, since I knew nothing about either of them, and so I decided on “youtube_dl”. Worst case scenario, if it didn’t work out, I would try the other API to work with.

After I decided on which API I would work with, I…started working with it. I learned about it, played with the different settings, and finally started adding it to the project. Ironically, the easiest part of working on this project was downloading the videos. It would be fair to assume so, since this segment consists of about 5 lines of code. To ensure that I was downloading only the highest quality available (and the most versatile), I prioritized downloading videos in the following order: .mp4, .webm, .mkv, etc. If none were available, I would download any available format. Since the next step would be converting the videos to .mp3, I figured it would be easier to convert .mp4 to .mp3 rather a format I haven’t even heard of before (looking at you–.mkv).

The only challenge that I remember facing with downloading the videos was the age-restriction aspect. For some reason, accessing YouTube’s age-restricted videos takes a few extra steps that I couldn’t solve due to not caring about the problem, I worked around this by looking for a song that was uploaded not by the artist, but by literally anyone else. More often than not, these uploaders don’t get stricken by YouTube and therefore have their videos not age-restricted. Pretty easy fix, I would say.

Converting the videos to .mp3

After downloading the videos, I had to convert them to .mp3, purely for my own well-being. I would be using OpenAI’s Whisper, which has a Python wrapper, and I thought it would be easier for it to download and transcribe exclusively an audio file, rather than a video file. Plus, having the audio makes testing easier. If I plan on making this an actual application on the website, the audio and video file would be deleted after the lyrics are generated to save space, so there wouldn’t be any reason for the artists to file a DMCA against me.

The conversion from video to audio was also only a few lines of code, but there is where I faced my first problem–being stupid.

I decided that it would be a genius idea to create a virtual environment (venv) for this project not in the beginning of the development process, but right in the middle. VS Code displayed a nifty little pop-up that asked, “Would you like to create a virtual environment for this workspace?”, and I, who is always curious, clicked “yes”. This caused literally everything to break, and resulted in me learning about what virtual environments are, how they work, and why they’re a pain to work with as a beginner. I had to reinstall all of my packages, tinker in the console, and eventually resort to starting from the beginning. In this case, starting from the beginning meant creating a new folder, copying my files to the new folder, and reinstalling my requirements.txt. in other words, curiosity costed me a few hours of work, and ended in me not progressing any further (as well as being side-tracked for a bit).

Transcribing the lyrics

After the audio was extracted, Whisper takes over for the transcription. This goldmine of an app allowed me to experiment with various models, namely “small”, “turbo” and “medium”, as I don’t have enough processing power to use any other model. While the better model, “medium”, is more accurate, “turbo” is as its name suggests, faster at processing. Since this was an MVP and not an app for the public to use, I just wanted to have an MVP and proof-of-concept to work, after which I might dedicate time and resources to significantly improve the quality of the project.

One challenge Whisper presented me was mentioned earlier–CPU limitations. I encountered the warning:

“FP16 not supported on CPU; using FP32 instead.”

What did I do? Absolutely nothing. It fixed itself, and the quality of the lyrics diminished marginally. It’s my passion project, and unless there’s demand for it, I won’t be spending time fixing it. And that’s the beauty of making personal projects–if a problem fixed itself, I don’t need to know why it fixed itself.

Closing thoughts

In the end, the MVP has been created, and it outputs lyrics correctly about 90% of the time. Of course, the faster and the more warped (e.g. EDM) the song is, the less accurate the lyrics are, but it works and that’s all that matters. I did, in the end get the lyrics to Serotonin, and I’m pretty satisfied with how it turned out. Here are the lyrics in case you’d like them:

I'm trying to figure out what's wrong with my head

I pray that I won't hear my ears just watching me sing

Even deeper than I did before

I've got glasses all over the floor

It's like every time you hit me I love you more

If you ever see the sign before

It doesn't matter at all

It doesn't matter at all

So much serotonin in my body

She likes fancy clothes and her parents' smile

If I were to end it here

Would she really care at all?

Diggin' up the cup from under me

I'm watching you

So much serotonin in my body

She likes fancy clothes and her parents' smile

If I were to end it here

Would she really care at all?

Diggin' up the cup from under me

I'm watching you

Flipping through the pages of my pain

I know I mess up God and I make too soon

I tried to blame it on you

But it was me and not my fault

But we're not strangers anymore

Wondered why we're sitting in your black seat

Playing on a phone

I really thought I could keep it in

But no, I'm not enough

No, I'm not enough

I love you, I can't do it

Just toss it in the trash

Just toss it in the trash

Would you just see me alone?

Alone, alone I think

Alone, alone I think

I've got glasses all over the floor

It's like every time you hit me

I love you more

If you're ever seen the sign before

It doesn't matter at all

It doesn't matter at all

Oh, oh

So much serotonin in my body

She likes fancy clothes and her parents' smile

If I were to end it here

would she really care at all?

Diggin' up the cup from under me

watchin' me fall

She's my serotonin in my body

She likes fancy clothes and her parents' smile

If I were to end it here

would she really care at all?

Diggin' up the cup from under me

watchin' me fall

If I were to end it

would she really care at all?

Diggin' up the cup from under me

watchin' me fall

Note: I didn’t anticipate the lyrics being so dark, since it sounded like a fairly upbeat song when I heard it.

I’m also excited for the possibilities for further customization. So far, I have the following ideas that I’d like to implement, and will probably do so when I have free time:

  • Converting songs to acapella where the background is too loud.

  • Providing translations from non-English languages into English in real-time.

  • Adding support for Spotify and Soundcloud.

  • Fixing the age-restricted songs issue.

  • Posting the app to pserikov.com where it can be played with in its entirety, as well as posting the lyrics to a separate section of the site.

  • Adding the ability to process playlists.

  • Being able to search a song by keyword instead of pasting a YouTube link.

This project has been a rewarding blend of creativity, mental anguish, and short-but-sweet dopamine hits. I’m looking forward to developing this project further, and possibly adding it to the site, as mentioned previously.

Feel free to comment below how else I could improve this project and whether I should implement it into the site, as well as any additional ideas you might be interested in seeing me bring to life.

For those who are interested, here is the project in its entirety. it has a lot of issues, but it works.

Next
Next

Introduction