Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
This paper was accepted as a spotlight at ICLR'21. We propose a new metric and contrastive loss that comes equipped with theoretical and empirical results. Policy Similarity Metric We introduce the policy similarity metric (PSM) which is based on bisimulation metrics. In contrast to bisimulation metrics (which is built on reward differences), PSMs are built on differences in optimal policies. If we were to use this metric for policy transfer (as Doina Precup & I explored previously), we can upper-bound the difference between the optimal and the transferred policy:
2020 RL highlights
As part of TWiML ’s AI Rewind series, I was asked to provide a list of reinforcement learning papers that were highlights for me in 2020. It’s been a difficult year for pretty much everyone, but it’s heartening to see that despite all the difficulties, interesting research still came out. Given the size and breadth of the reinforcement learning research, as well as the fact that I was asked to do this at the end of NeurIPS and right before my vacation, I decided to apply the following rules in the selection:
Autonomous navigation of stratospheric balloons using reinforcement learning
In this work we, quite literally, take reinforcement learning to new heights! Specifically, we use deep reinforcement learning to help control the navigation of stratospheric balloons, whose purpose is to deliver internet to areas with low connectivity. This project is an ongoing collaboration with Loon. It’s been incredibly rewarding to see reinforcement learning deployed successfully in a real setting. It’s also been terrific to work alongside such fantastic co-authors: Marc G.
Agence: a dynamic film exploring multi-agent systems and human agency
Agence is a dynamic and interactive film authored by three parties: 1) the director, who establishes the narrative structure and environment, 2) intelligent agents, using reinforcement learning or scripted (hierarchical state machines) AI, and 3) the viewer, who can interact with the system to affect the simulation. We trained RL agents in a multi-agent fashion to control some (or all, based on user choice) of the agents in the film. You can download the game at the Agence website.
GANterpretations is an idea I published in this paper, which was accepted to the 4th Workshop on Machine Learning for Creativity and Design at NeurIPS 2020. The code is available here. At a high level what it does is use the spectrogram of a piece of audio (from a video, for example) to “draw” a path in the latent space of a BigGAN. The following video walks through the process:
Introduction to reinforcement learning
This post is based on this colab. You can also watch a video where I go through the basics here. Pueden ver un video (en español) donde presento el material aquí. Introduction Reinforcement learning methods are used for sequential decision making in uncertain environments. It is typically framed as an agent (the learner) interacting with an environment which provides the agent with reinforcement (positive or negative), based on the agent’s decisions.
Rigging the Lottery: Making All Tickets Winners
Rigging the Lottery: Making All Tickets Winners is a paper published at ICML 2020 with Utku Evci, Trevor Gale, Jacob Menick, and Erich Elsen, where we introduce an algorithm for training sparse neural networks that uses a fixed parameter count and computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. You can read more about it in the paper and in our blog post.
Artificial General Relativity
We (well, I) introduce a New Field In Science which we (I mean I) call Artificial General Relativity. We (here I really mean “we”) have all heard of General Relativity and how it revolutionized our understanding of the world around us. Einstein’s work, although pivotal, failed in one crucial aspect: although it allowed us to describe gravity and spacetime, it did not allow us to control them. In this paper I (switching to “I” to avoid sounding pretentious with “we”) introduce Artificial General Relativity (AGR) which, when achieved, will allow us to control gravity and spacetime.
GridWorld playground! I made a website where you can Draw your own GridWorlds Play around with hyperparameters while agent is training Transfer values between agents “Teleport” the agent to help it during learning Hope you find it useful and fun!
Tips for Interviewing at Google
Disclaimer: This post reflects my personal views and not those of my employer. People often ask me: How do I get a job at Google? An essential requirement is passing the interviews; unsurprisingly, this is another common question: How do I pass the Google interviews? While there is no hard and fast rule to pass the Google interviews, I do have some tips and guidelines that have helped others in the past (including myself).
Tips for preparing your resume
Disclaimer: This post reflects my personal views and not those of my employer. In my previous post providing tips for interviewing at Google, I included the sentence “If you don’t know anyone at Google, you’ve already applied and haven’t heard back in a while, feel free to send me a note with your CV and I’ll see if there’s something I can do.” I received a number of requests from people who had applied but never heard back.
Scalable methods for computing state similarity in deterministic MDPs
This post describes my paper Scalable methods for computing state similarity in deterministic MDPs, published at AAAI 2020. The code is available here. Motivation We consider distance metrics between states in an MDP. Take the following MDP, where the goal is to reach the green cells: Physical distance betweent states? Physical distance often fails to capture the similarity properties we’d like: State abstractions Now imagine we add an exact copy of these states to the MDP (think of it as an additional “floor”):