Reinforcement Learning with a Terminator

Authors: Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We deploy our method on high-dimensional driving and Min Atar benchmarks. Additionally, we test our approach on human data in a driving setting. Our results demonstrate fast convergence and significant improvement over various baseline approaches.
Researcher Affiliation Collaboration Technion, Israel institute of technology Nvidia Research, Israel Bar Ilan University, Israel
Pseudocode Yes Algorithm 1 Term CRL: Termination Confidence Reinforcement Learning; Algorithm 2 Term PG
Open Source Code Yes Code for Backseat Driver and our method, Term PG, can be found at https://github.com/guytenn/Terminator.
Open Datasets Yes We further compared our method to the PG, recurrent PG, and reward shaping methods, on Min Atar [Young and Tian, 2019].
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits with percentages or counts.
Hardware Specification Yes All experiments ran on our machine with 4 NVIDIA GeForce RTX 3090 GPUs and Intel Core i9-10900X CPU.
Software Dependencies No The paper mentions using 'MLAgents [Juliani et al., 2018]' and 'RLlib [Liang et al., 2018]' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Min Atar experiments ran for 5 million time steps with a learning rate of 5e-4 and a batch size of 2048. Backseat Driver experiments ran for 15 million time steps with a learning rate of 1e-4 and a batch size of 2048.