2020 RL highlights

As part of TWiML ’s AI Rewind series, I was asked to provide a list of reinforcement learning papers that were highlights for me in 2020. It’s been a difficult year for pretty much everyone, but it’s heartening to see that despite all the difficulties, interesting research still came out.

Given the size and breadth of the reinforcement learning research, as well as the fact that I was asked to do this at the end of NeurIPS and right before my vacation, I decided to apply the following rules in the selection:

Select only papers published in AAAI, ICLR, ICML, or NeurIPS. Like any good rule, there are a few exceptions :).
Select papers only from areas where I’m most actively doing research. The last section is an exception to this rule.

Due to time constraints, my process of selection was most likely not the best; if you feel there are papers I’m omitting here, send them my way and I may add them. They are also presented here in no particular order, and for most I provide only a brief synposis taken from the papers themselves. Unless written in auburn colour, all texts below are taken from the source papers.

You can get more details in the TWiML post, or listen to it directly here:

After having laid out all those disclaimers, hope this list proves useful!

Metrics / Representations

One of my most active areas of research is investigating how to build good representations for learning. While there is no clear, and globally accepted, definition of what it means to have a good representation, what I take it to mean is:

Has lower dimensionality than the original state space
Can be learned concurrently while policies are being improved upon
Can generalize well to unseen states
Has well-developed theoretical properties

I first begin with some theoretical papers related to representation learning, and then I continue with metrics. I am of the belief that state (or state-action) metrics can help us with the last point and, if done carefully, can yield the other points as well. I begin with some papers that deal with state similarity measures, and transition into more “traditional” representation learning papers, which mostly make use of contrastive losses.

Representations for Stable Off-Policy Reinforcement Learning

Dibya Ghosh, Marc G. Bellemare

Pablo Samuel Castro

2020 RL highlights

Metrics / Representations

Representations for Stable Off-Policy Reinforcement Learning

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

Scalable methods for computing state similarity in deterministic MDPs

Learning Invariant Representations for Reinforcement Learning without Reconstruction

Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery

State Alignment-based Imitation Learning

Fast Task Inference with Variational Intrinsic Successor Features

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Contrastive Learning of Structured World Models

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

Planning to Explore via Self-Supervised World Models

Dream to Control: Learning Behaviors by Latent Imagination

Model Based Reinforcement Learning for Atari

Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning

Latent World Models For Intrinsically Motivated Exploration

Understanding / evaluating deep RL

Revisiting Rainbow

Measuring the Reliability of Reinforcement Learning Algorithms

Revisiting Fundamentals of Experience Replay

Behaviour Suite for Reinforcement Learning

Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution

Implementation Matters in Deep RL: A Case Study on PPO and TRPO

RL in the real world

Autonomous navigation of stratospheric balloons using reinforcement learning

Estimating Policy Functions in Payment Systems using Reinforcement Learning

Agence: a dynamic film exploring multi-agent systems and human agency

Other

Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions

Munchausen Reinforcement Learning

An operator view of policy gradient methods

What Can Learned Intrinsic Rewards Capture?

Table of Contents