I was born and raised in Quito, Ecuador, and moved to Montreal after high school to study at McGill. I stayed in Montreal for the next 10 years, finished my bachelors, worked at a flight simulator company, and then eventually obtained my masters and PhD at McGill, focusing on Reinforcement Learning under the supervision of Doina Precup and Prakash Panangaden. After my PhD I did a 10-month postdoc in Paris before moving to Pittsburgh to join Google. I have worked at Google for close to 9 years, and am currently a staff research Software Developer in Google Brain in Montreal, focusing on fundamental Reinforcement Learning research, Machine Learning and Creativity, and being a regular advocate for increasing the LatinX representation in the research community. Aside from my interest in coding/AI/math, I am an active musician, love running (6 marathons so far, including Boston!), and discussing politics and activism.
Happy holidays from the MUSICODE “team”! “Portrait of Hallelagine”, a mashup of Jaco Pastorius' “Portrait of Tracy”, Leonard Cohen’s “Hallelujah”, and John Lennon’s “Imagine”. 100% of video editing done with Runway!
We argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. We advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results.
We propose the “tandem learning” experimental design, where two RL agents are learning from identical data streams, but only one interacts with the environment to collect the data. We use this experiment design to study the empirical challenges of offline reinforcement learning. Georg Ostrovski, Pablo Samuel Castro, Will Dabney This blogpost is a summary of our NeurIPS 2021 paper. We provide two Tandem RL implementations: this one based on the DQN Zoo, and this one based on the Dopamine library.