Original Source Here
Prior to the contest announcement, I was pursuing a simple question: Can AI generate funky breakbeats? After failing to train a competent VAE on a public drum audio dataset, I suspended my nascent research and instead experimented with the publicly released OpenAI Jukebox model. My goal was just to trick Jukebox into generating drum loops. To do so, I asked Jukebox to continue drum loops from a sample CD that I purchased long ago. I provided the Jukebox model genres and artists likely associated with breaks and minimalist lyrics like “make it funky now!” that precede the break in some well-known funk songs.
Jukebox produced great drum audio, but rarely for long, veering instead towards deep and funky grooves. A natural idea emerged at the same time the contest was announced: I would use this material as I would have used samples back in the 90s to assemble a song.
Jukebox samples are mono and gritty. Wishing to produce a higher-quality contest entry, I decided to use samples from Jukebox mostly as a scaffold for my song with the goal of removing as many as possible from the final mix. For example, I replaced a Jukebox-generated bass loop by transcribing it and re-rendering it with a higher-quality VST plugin.
I also used the DDSP tone transfer model to create a credible sax solo, and used Spleeter to separate vocals from loops. Source separation helped me add more character to the track and raise the level of the vocals in the mix.
Workflow: Artist names, genres, and lyrics
I was trying to coax Jukebox to create breakbeats, so I picked funk, soul, and disco genres. I also created lyrics with lots of short shout-like phrases like “hey!” or “make it funky now!”. I chose a variety of popular American and European funk, disco and hip hop artists with recordings in the 70s.
Scarface Classic Beats and Breaks Volume 1 is the first sample CD I ever purchased in 1995. At the time, its 71 tracks felt like a goldmine of breakbeats to sample. Prior to this contest, I chopped them to 1 bar and slowed them to exactly 262144 samples (512×512) for use in a VAE or GAN experiment. I am planning to continue this research, encouraged by the recent experiment of Garkavy and Ishimbaev published to Youtube. My goal however was to first and foremost produce a song by the contest deadline, so I set aside further experiments with creating and training my own models.
Workflow: continuations of source audio
I generated continuations of the sample CD breakbeats using OpenAI’s Interacting With Jukebox colab notebook. After the first experiment, I was amazed by Jukebox’s ability to generate drum fills and loops, and became fascinated with the idea of sampling such material for a song. It was around this time I became aware of the 2021 AI Song Contest.
Jukebox often continues a provided drum beat for a few bars with slight variation before launching into a song. In a few cases, Jukebox generated mostly drum audio for the entire round of inference. Sometimes it veered massively off course, but interestingly so!
Here is a complete annotated playlist of all the audio I considered for sampling.
Jukebox often keeps reasonably good time, but like a human drummer, its tempo is subject to subtle drifts. To use the material in Ableton Live’s session view, I cut the audio into precise 1-bar loops, using Live’s time warping to align major transients to downbeats. I placed the 1-bar loops from each OpenAI experiment on 21 separately-named tracks in Live so I could remember the character of each experiment. I then determined the key of each clip and gave it a fun and informative name. At this point, my workspace then looked like this (partial view).
To some extent this laborious part of the process resembled Andrej Karpathy’s instruction to become one with the data when assembling a dataset for training a neural network. For weeks I heard these loops on repeat in my mind’s ear, as though they were sampled from original recordings from the era. My mind made no distinction between real and AI-generated music in its ability to put a loop on repeat.
Workflow: Initial Sketch
To sketch a song, I recorded myself triggering clips in Live. I then edited the result to create a reasonably good arc of repetition and variation. This process was relatively quick; I didn’t have to test many combinations, as no more than 2–3 clips gelled together at one time. Had I started earlier in this project with source separation, it would have been interesting to lay out vocals, bass, drums and other stems created with Spleeter in Session view, and then use a midi controller for tactile interaction with the clips in Live’s session view.
You can listen to the original sketch composed solely of Jukebox material here. After creating it, I decided against limiting myself in the contest submission to only the audio sampled from Jukebox. Samples from Jukebox are mono and tend to be muddy and gritty. I decided to retain their essential character but improve their sonic qualities with reorchestration.
Reorchestration, source separation, and direct sampling:
Certain Jukebox loops suggested the presence of a wah-wah guitar. To add one to the song, I used the Scarbee Funk Guitarist Kontakt instrument. This “intelligent” instrument allows a user to trigger chords and phrases with two fingers on a piano keyboard. I am amazed by the amount of work in this instrument that went into painstakingly recording and sequencing a human guitarist. I wonder if generative models will obviate this task in the future. I also used the similar Session Guitarist Strummed Acoustic 2 Kontakt instrument earlier in the song.
Some Jukebox loops, particularly those with vocals, also suggested the presence of a horn section. I layered in additional horns from the Kontakt instrument Session Horns Pro to enhance the clarity of the horn phrases. I often find horn section samples by themselves to be too clean. I liked the texture created by mixing the instrument with the Jukebox audio.
Experiment 7–1 is the main influence for the first part of the song. The electric piano in various bars of my song started off as a transcription of loops from 7–1. I ultimately discarded the transcriptions. Instead, I used chords and licks I played early in the production process during a long episode of jamming with the Jukebox bassline and breakbeats.
The drums are a layered hybrid of the breakbeats from many Jukebox experiments and drum samples from Native Instruments’ Battery 4 plugin. I chose to augment the breakbeats with Battery because the kicks and snares in Jukebox breaks sometimes lack sharp transients. I also used Spleeter to isolate drums in some instances; with transient enhancement, gating, and other effects I could then, for example, isolate the conga drums from one loop and turn another loop into something resembling a shaker. I also sourced a few crash cymbals from Battery.
The vocals are the most direct samples from Jukebox. Due to ethical concerns, I tried to use short vocal samples that aren’t too suggestive of any particular artist. The main “make it funky now” vocals are closest to sounding like any particular artist from the conditioning input. In order to raise the volume of the vocals in the mix, I separated the vocals with Spleeter and added them into the mix as another track.
I also timed-stretched the vocal in the early part of experiment 7–1 into vocal ambiances and various crescendo effects peppered throughout the song.
Any good funk song is incomplete without a saxophone! It was so easy to play a piano solo, upload the audio to the DDSP timbre transfer notebook, and have a sax solo I could use in about 15 minutes of work. I can’t wait for Magenta to release a timbre transfer VST plugin.
Effects, Mixing and Mastering
I didn’t use any AI in choosing which effects to use, how to mix the tracks together, or in the final mastering phase. I used a variety of Native Instruments effects plugins and Reaktor ensembles. Molekuar and Raum are recent favorites. I considered using Landr for mastering, but as I’m not the best at mixing and mastering, I like to use my productions as an opportunity to improve my skills in this area.
This was the easter egg in my contest submission. To stick with the theme of sampling, I recreated the layout from the James Brown’s Funky Divas LP. The models were my divas.
So what did it mean to make music with AI in my case?
Jukebox primarily served as a muse. Through the audio it generated, it suggested different potential compositions that I then synthesized into a cohesive whole. I might have created a song similar to my submission without Jukebox, but it might not have come together as quickly or had the same harmonic structure.
Jukebox also served as a looped vocalist as well as several drummers owing to the stacking of breakbeat loops.
The DDSP timbre transfer model served a translator. I could express my intention using a piano keyboard, and DDSP helped me render it with an alternative timbre.
Spleeter was a prism. It facilitated transcription, enabled upmixing, and seeded sound design. To bring it into my production flow, I tried Azuki’s Max4Live device, but ultimately found the command line more reliable.
The process presented above is one of many paths to choose from when making music with or by AI. Exploring other paths is a topic that I’ll perhaps leave for other blog posts. However, here is an amazing 96-page overview paper published at the 2020 that should keep you busy!
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot