Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning
Authors: Adam R Villaflor, Zhe Huang, Swapnil Pande, John M Dolan, Jeff Schneider
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our method s superior performance on a variety of autonomous driving tasks in simulation. For all experiments, we compare our SPLT Transformer method to Trajectory Transformer (TT), Decision Transformer (DT), and Behavioral Cloning (BC) with a Transformer model. |
| Researcher Affiliation | Academia | Adam Villaflor 1 Zhe Huang 1 Swapnil Pande 1 John Dolan 1 Jeff Schneider 1 1Carnegie Mellon University. Correspondence to: Adam Villaflor <avillaflor@cmu.edu>, Jeff Schnedier <jeff.schneider@cs.cmu.edu>. |
| Pseudocode | No | The paper describes procedures in text but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/avillaflor/SPLTtransformer |
| Open Datasets | Yes | Most prior works in offline RL have focused on the mainly deterministic D4RL (Fu et al., 2020) benchmarks and a variety of weakly stochastic Atari (Machado et al., 2018) benchmarks. We evaluate our method on the CARLA (Dosovitskiy et al., 2017) No Crash (Codevilla et al., 2019) benchmark. |
| Dataset Splits | No | The paper mentions training on Town01 data and evaluating on unseen Town02 routes, but does not provide specific details on how the datasets were split into training, validation, and test sets, nor percentages or sample counts for these splits. |
| Hardware Specification | No | The paper mentions 'modern GPU hardware' in the context of computational efficiency but does not provide specific details such as GPU models, CPU types, or memory specifications used for experiments. |
| Software Dependencies | Yes | For these experiments, we run the 0.9.11 version of CARLA at 5fps. For these experiments, we run the 0.9.10.1 version of CARLA at 10fps. |
| Experiment Setup | Yes | For all Transformer-based methods across all experiments, we kept the general Transformer hyperparameters consistent. We used 4 layers of self-attention blocks with 8 heads and an embedding size of 128. ... For our SPLT method, the only additional important hyperparameters are c, nw, and nπ for the latent variables, β for the VAE, and h and k for the planning. We generally did a hyperparameter search over nw [2, 4], nπ [2, 4], β {1e 4, 1e 3, 1e 2}, h {5, 10} and k {2, 5}. For, the toy illustrative problem we used c = 2, nw = 2, nπ = 3, β = 1e 3, h = 5, and k = 5. For No Crash, we used c = 2, nw = 3, nπ = 2, β = 0.01, h = 5, and k = 2. For Leaderboard, we used c = 2, nw = 3, nπ = 2, β = 0.01, h = 5, and k = 2. |