MICo: Learning improved representations via sampling-based state similarity for Markov decision processes
We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents.
Pablo Samuel Castro*, Tyler Kastner*, Prakash Panangaden, and Mark Rowland
This blogpost is a summary of our NeurIPS 2021 paper. The code is available here.
The following figure gives a nice summary of the empirical gains our new loss provides, yielding an improvement on all of the Dopamine agents (left), as well as over Soft Actor-Critic and the DBC algorithm of Zhang et al.