
GANterpretations is an idea I published in this paper, which was accepted to the 4th Workshop on Machine Learning for Creativity and Design at NeurIPS 2020. The code is available here.

At a high level what it does is use the spectrogram of a piece of audio (from a video, for example) to “draw” a path in the latent space of a BigGAN.

The following video walks through the process:


GANs are generative models trained to reproduce images from a given dataset. The way GANs work is they are trained to learn a latent space $ Z\in\mathbb{R}^d $, where each point $ z\in Z $ generates a unique image $ G(z) $, where $ G $ is the generator of the GAN. When trained properly, these latent spaces are learned in a structured manner, where nearby points generate similar images.

ML-Jam: Performing Structured Improvisations with Pre-trained Models

This paper, published in the International Conference on Computational Creativity, 2019, explores using pre-trained musical generative models in a collaborative setting for improvisation.

You can read more details about it in this blog post.

You can also play with it in this web app!

If you want to play with the code, it is here.


Demo video playing with the web app:

ML-Jam demo

Demo video jamming over Herbie Hancock’s Chameleon:

Chameleon demo

Demo video over free improvisation: