Prediction and Control in Continual Reinforcement Learning
Authors: Nishanth Anand, Doina Precup
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, this approach improves performance significantly on both prediction and control problems. |
| Researcher Affiliation | Collaboration | Nishanth Anand School of Computer Science Mc Gill University and Mila nishanth.anand@mail.mcgill.ca Doina Precup School of Computer Science Mc Gill University, Mila, and Deepmind dprecup@cs.mcgill.ca |
| Pseudocode | Yes | Algorithm 1 PT-TD learning (Prediction) |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the methodology. |
| Open Datasets | Yes | Empirical case studies of the proposed approaches in simple gridworlds, Minigrid [11], Jelly Bean World (JBW) [31], and Min Atar environments [51]. |
| Dataset Splits | No | The paper describes episodic training and task changes (e.g., 'We run 750 episodes and change rewards every 75 episodes.') but does not specify explicit train/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | The discount factor is 0.9 in all cases. We run 750 episodes and change rewards every 75 episodes. We use epsilon-greedy policy with e = 0.1 for exploration. Experience replay buffer s capacity is capped to 100k. |