reproducibilityindex.ai

Offline Reinforcement Learning as One Big Sequence Modeling Problem

Authors: Michael Janner, Qiyang Li, Sergey Levine

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental evaluation focuses on (1) the accuracy of the Trajectory Transformer as a longhorizon predictor compared to standard dynamics model parameterizations and (2) the utility of sequence modeling tools namely beam search as a control algorithm in the context of ofﬂine reinforcement learning, imitation learning, and goal-reaching. [...] Results for the locomotion environments are shown in Table 1. [...] Ant Maze results are provided in Table 2.
Researcher Affiliation	Academia	Michael Janner Qiyang Li Sergey Levine University of California at Berkeley {janner, qcli}@berkeley.edu svlevine@eecs.berkeley.edu
Pseudocode	Yes	Algorithm 1 Beam search
Open Source Code	Yes	Code is available at trajectory-transformer.github.io
Open Datasets	Yes	We evaluate the Trajectory Transformer on a number of environments from the D4RL ofﬂine benchmark suite (Fu et al., 2020), including the locomotion and Ant Maze domains.
Dataset Splits	No	The paper mentions 'training is performed' and discusses 'training set' in the context of discretization, and uses standard D4RL benchmarks, but does not explicitly state the specific training/validation/test dataset splits (e.g., percentages or counts) within the paper.
Hardware Specification	No	The paper mentions 'computational resource donations from Microsoft' but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper lists 'Num Py (Harris et al., 2020), Py Torch (Paszke et al., 2019), and min GPT (Karpathy, 2020)' but does not provide explicit version numbers for these libraries (e.g., PyTorch 1.9).
Experiment Setup	Yes	Our model is a Transformer decoder mirroring the GPT architecture (Radford et al., 2018). We use a smaller architecture than those typically used in large-scale language modeling, consisting of four layers and four self-attention heads. [...] We use the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 2.5 10 4 to train parameters θ.