reproducibilityindex.ai

Reinforcement Learning with a Terminator

Authors: Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We deploy our method on high-dimensional driving and Min Atar benchmarks. Additionally, we test our approach on human data in a driving setting. Our results demonstrate fast convergence and significant improvement over various baseline approaches.
Researcher Affiliation	Collaboration	Technion, Israel institute of technology Nvidia Research, Israel Bar Ilan University, Israel
Pseudocode	Yes	Algorithm 1 Term CRL: Termination Confidence Reinforcement Learning; Algorithm 2 Term PG
Open Source Code	Yes	Code for Backseat Driver and our method, Term PG, can be found at https://github.com/guytenn/Terminator.
Open Datasets	Yes	We further compared our method to the PG, recurrent PG, and reward shaping methods, on Min Atar [Young and Tian, 2019].
Dataset Splits	No	The paper does not explicitly provide specific training/validation/test dataset splits with percentages or counts.
Hardware Specification	Yes	All experiments ran on our machine with 4 NVIDIA GeForce RTX 3090 GPUs and Intel Core i9-10900X CPU.
Software Dependencies	No	The paper mentions using 'MLAgents [Juliani et al., 2018]' and 'RLlib [Liang et al., 2018]' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	Min Atar experiments ran for 5 million time steps with a learning rate of 5e-4 and a batch size of 2048. Backseat Driver experiments ran for 15 million time steps with a learning rate of 1e-4 and a batch size of 2048.