reproducibilityindex.ai

Improving Zero-Shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

Authors: Bogdan Mazoure, Ilya Kostrikov, Ofir Nachum, Jonathan J. Tompson

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate performance of GSF and other baseline methods on both benchmarks, and show that GSF outperforms both previous state-of-the-art ofﬂine RL and representation learning baselines on the entire distribution of levels.
Researcher Affiliation	Collaboration	Bogdan Mazoure Mc Gill University, Quebec AI Institute; Ilya Kostrikov UC Berkeley Google Brain; Jonathan Tompson Google Brain
Pseudocode	Yes	Algorithm 1: Learn GVF(c,Dµ,i, (0),J, ,γ): Ofﬂine estimation of GVF ˆGµ i
Open Source Code	Yes	Code can be found at https://github.com/bmazoure/gsf_public.
Open Datasets	Yes	we devised two new benchmarks: ofﬂine Procgen (discrete actions) and ofﬂine Distracting Suite (continuous actions) two ofﬂine RL datasets to directly test for generalization of RL agents across observation functions.
Dataset Splits	No	An important distinction from online RL is that, in the ofﬂine RL setting, we assume access to a historical dataset Dµ (instead of a simulator) collected by logging experience of the policy, µ, in the form {oi,t,ai,t,ri,t}i=N,t=T i=1,t=1 where, for practical purposes, the episode is truncated at T timesteps. Furthermore, we assume that the agent can only be trained on a limited collection of POMDPs Mtrain = {Mi}m i=1, and its performance is evaluated on the set of test POMDPs Mtest.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running its experiments.
Software Dependencies	No	The paper does not provide specific version numbers for key software components or libraries used in the experiments.
Experiment Setup	No	The paper mentions "1 million gradient steps" for training on offline Procgen and "1M frames" for Distracting Control Suite, but does not specify concrete hyperparameter values such as learning rates, batch sizes, or optimizer settings.