Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

Authors: Adam R Villaflor, Zhe Huang, Swapnil Pande, John M Dolan, Jeff Schneider

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our method s superior performance on a variety of autonomous driving tasks in simulation. For all experiments, we compare our SPLT Transformer method to Trajectory Transformer (TT), Decision Transformer (DT), and Behavioral Cloning (BC) with a Transformer model.
Researcher Affiliation	Academia	Adam Villaflor 1 Zhe Huang 1 Swapnil Pande 1 John Dolan 1 Jeff Schneider 1 1Carnegie Mellon University. Correspondence to: Adam Villaflor <EMAIL>, Jeff Schnedier <EMAIL>.
Pseudocode	No	The paper describes procedures in text but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/avillaflor/SPLTtransformer
Open Datasets	Yes	Most prior works in offline RL have focused on the mainly deterministic D4RL (Fu et al., 2020) benchmarks and a variety of weakly stochastic Atari (Machado et al., 2018) benchmarks. We evaluate our method on the CARLA (Dosovitskiy et al., 2017) No Crash (Codevilla et al., 2019) benchmark.
Dataset Splits	No	The paper mentions training on Town01 data and evaluating on unseen Town02 routes, but does not provide specific details on how the datasets were split into training, validation, and test sets, nor percentages or sample counts for these splits.
Hardware Specification	No	The paper mentions 'modern GPU hardware' in the context of computational efficiency but does not provide specific details such as GPU models, CPU types, or memory specifications used for experiments.
Software Dependencies	Yes	For these experiments, we run the 0.9.11 version of CARLA at 5fps. For these experiments, we run the 0.9.10.1 version of CARLA at 10fps.
Experiment Setup	Yes	For all Transformer-based methods across all experiments, we kept the general Transformer hyperparameters consistent. We used 4 layers of self-attention blocks with 8 heads and an embedding size of 128. ... For our SPLT method, the only additional important hyperparameters are c, nw, and nπ for the latent variables, β for the VAE, and h and k for the planning. We generally did a hyperparameter search over nw [2, 4], nπ [2, 4], β {1e 4, 1e 3, 1e 2}, h {5, 10} and k {2, 5}. For, the toy illustrative problem we used c = 2, nw = 2, nπ = 3, β = 1e 3, h = 5, and k = 5. For No Crash, we used c = 2, nw = 3, nπ = 2, β = 0.01, h = 5, and k = 2. For Leaderboard, we used c = 2, nw = 3, nπ = 2, β = 0.01, h = 5, and k = 2.