Reinforcement Learning with a Terminator
Authors: Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We deploy our method on high-dimensional driving and Min Atar benchmarks. Additionally, we test our approach on human data in a driving setting. Our results demonstrate fast convergence and significant improvement over various baseline approaches. |
| Researcher Affiliation | Collaboration | Technion, Israel institute of technology Nvidia Research, Israel Bar Ilan University, Israel |
| Pseudocode | Yes | Algorithm 1 Term CRL: Termination Confidence Reinforcement Learning; Algorithm 2 Term PG |
| Open Source Code | Yes | Code for Backseat Driver and our method, Term PG, can be found at https://github.com/guytenn/Terminator. |
| Open Datasets | Yes | We further compared our method to the PG, recurrent PG, and reward shaping methods, on Min Atar [Young and Tian, 2019]. |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits with percentages or counts. |
| Hardware Specification | Yes | All experiments ran on our machine with 4 NVIDIA GeForce RTX 3090 GPUs and Intel Core i9-10900X CPU. |
| Software Dependencies | No | The paper mentions using 'MLAgents [Juliani et al., 2018]' and 'RLlib [Liang et al., 2018]' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Min Atar experiments ran for 5 million time steps with a learning rate of 5e-4 and a batch size of 2048. Backseat Driver experiments ran for 15 million time steps with a learning rate of 1e-4 and a batch size of 2048. |