The Dormant Neuron Phenomenon in Deep Reinforcement Learning
We identify the dormant neuron phenomenon in deep reinforcement learning, where an agent’s network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro*, Utku Evci* This blogpost is a summary of our ICML 2023 paper. The code is available here. Many more results and analyses are available in the paper, so I encouraged you to check it out if interested!
I’ve been in bands since I was 12. In a parallel universe I’m a full-time musician :D. This page collects the albums I’ve released so far, in reverse chronological order. Enjoy! gregor samsa - amorfo (2023) With my good friend Esteban Nichols, we recorded this jazz fusion album during 2022. It was challenging because Esteban (and Matías) recorded in Quito, Ecuador, while I recorded in Ottawa! I’m quite proud that this album was made with 100% Ecuadorians.
I learned on the radio that last November 29th marked the 50th anniversary of the classic arcade game Pong. This game is particularly meaningful for those of us that do RL research, as it is one of the games that is part of the Arcade Learning Environment, one of the most popular benchmarks. Pong is probably the easiest game of the whole suite, so we often use it as a test to make sure our agents are learning.
Introducción a los Transformers
Como parte de la RIIAA en Quito, di una introducción a los Transformers, que es la arquitectura detrás de avances como GPT-3, Music Transformer, Parti, y muchos otros. Grabación Pueden ver la grabación aquí: Materiales Aquí pueden acceder a los diferentes materiales que mencioné durante el curso: Las diapositivas que usé en el curso Write with Transformers de Hugging Face (GPT-2) Eleuther GPT-J-6B, que es mucho mejor modelo que el GPT-2 de Hugging Face El colab simple sobre bigrams El colab de Flax sobre LSTMs El excelente the Illustrated Transformer de Jay Alammar, en el cual basé la descripción de Transformers.
Th State of Spars Train ng in D ep Re nforc m nt Le rn ng
We perform a systematic investigation into applying a number of existing sparse training techniques on a variety of deep RL agents and environments, and conclude by suggesting promising avenues for improving the effectiveness of sparse training methods, as well as for advancing their use in DRL. Laura Graesser*, Utku Evci*, Erich Elsen, Pablo Samuel Castro This blogpost is a summary of our ICML 2022 paper. The code is available here. Many more results and analyses are available in the paper, so I encouraged you to check it out if interested!
Crosswords: A General Intelligence Challenge?
I have become obsessed with crossword puzzles, specifically the NYT crosswords, since my friend Ralph Crewe gently forced me to start doing them. Although I’m not still at his level, I’ve been working on them daily and getting noticeably better. In doing so I’ve come to realize they are a fantastic mechanism for testing generally capable problem-solving, and in this post would like to explain the various types of challenges they present.
What is a palindrome? A palindrome is a phrase that reads the same way from left to right, and right to left. The rules are that all characters must be used in both directions, but punctuation, capitalization, and spaces can be ignored. ¡Las mismas reglas en español! Some well-known Palindromes: A man, a plan, a canal, Panama! Do geese see god? Yo, banana boy! Unos palíndromos en español: Dábale arroz a la zorra el abad.
CME is A-OK
The thread I wrote at the start of perf season at Google seemed to resonate with lots of people, so I decided to put a slightly extended version of it in blog-post form. What is perf? In brief, “perf” season at Google is when we evaluate our performance over the last few months, in the form of a self-assessment, and our peers provide their assessments on how they perceive our performance.
Portrait of Hallelagine
Happy holidays from the MUSICODE “team”! “Portrait of Hallelagine”, a mashup of Jaco Pastorius’ “Portrait of Tracy”, Leonard Cohen’s “Hallelujah”, and John Lennon’s “Imagine”. 100% of video editing done with Runway!
Deep Reinforcement Learning at the Edge of the Statistical Precipice
We argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. We advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results.
The Difficulty of Passive Learning in Deep Reinforcement Learning
We propose the “tandem learning” experimental design, where two RL agents are learning from identical data streams, but only one interacts with the environment to collect the data. We use this experiment design to study the empirical challenges of offline reinforcement learning. Georg Ostrovski, Pablo Samuel Castro, Will Dabney This blogpost is a summary of our NeurIPS 2021 paper. We provide two Tandem RL implementations: this one based on the DQN Zoo, and this one based on the Dopamine library.
MICo: Learning improved representations via sampling-based state similarity for Markov decision processes
We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. Pablo Samuel Castro*, Tyler Kastner*, Prakash Panangaden, and Mark Rowland This blogpost is a summary of our NeurIPS 2021 paper. The code is available here. The following figure gives a nice summary of the empirical gains our new loss provides, yielding an improvement on all of the Dopamine agents (left), as well as over Soft Actor-Critic and the DBC algorithm of Zhang et al.