reproducibilityindex.ai

Revisiting Bellman Errors for Offline Model Selection

Authors: Joshua P Zitovsky, Daniel De Marchi, Rishabh Agarwal, Michael Rene Kosorok

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our estimator obtains impressive OMS performance on diverse discrete control tasks, including Atari games. 5. Empirical Results
Researcher Affiliation	Collaboration	1Department of Biostatistics, UNC Chapel Hill, North Carolina USA 2Google Deep Mind 3Mila.
Pseudocode	Yes	Our algorithm is summarized in Algorithm 1. Algorithm A.1 SBV with Tuned Regression Algorithm. Algorithm A.2 Applying Early Stopping to DQN with SBV.
Open Source Code	Yes	Finally, we open-source our code at https://github.com/jzitovsky/SBV.
Open Datasets	Yes	SBV achieves strong performance on diverse tasks ranging from healthcare problems (Klasnja et al., 2015) to Atari games (Bellemare et al., 2013). For the Bicycle control problem, we generated 10 offline datasets...following Ernst et al. (2005). For the m Health control problem, we generated 10 offline datasets...following Luckett et al. (2020). Finally, we evaluated SBV (Algorithm 1) on 12 offline DQNReplay datasets (Agarwal et al., 2020)...
Dataset Splits	Yes	Randomly partition trajectories in D to training set DT and validation set DV. (Algorithm 1) While P µ is unknown, we can still estimate the expectation in Equation 4 by randomly partitioning 80% of the trajectories present in D into a training set DT and reserving the remaining 20% of trajectories as a validation set DV.
Hardware Specification	Yes	Atari experiments were conducted using a mix of A100 and V100 GPUs from both our university’s computing cluster and GCP virtual machines. With four A100s and four V100s (or with six A100s)... Non-Atari experiments were conducted using 2.50 GHz Intel CPU cores from our university’s computing cluster.
Software Dependencies	Yes	Unless otherwise specified, all layers use the default parameters specified by Tensor Flow v2.5.0 (Abadi et al., 2015)... The scripts we wrote to run DQN with these configurations made heavy use of the Dopamine library (Castro et al., 2018).
Experiment Setup	Yes	We tweaked the learning rate and target update frequency... to 2.5e-5 and 32,000, respectively... (D.1). Optimizer: Adam(learning_rate=2.5e-5, loss=Huber, batch_size=128, target_update_freq=32,000) (Figure D.1). Optimizer: Nadam(learning_rate=5e-4, loss=MSE, batch_size=512, max_epochs=40, mixed_precision=True) (Figure D.2).