A State Representation for Diminishing Rewards

Authors: Ted Moskovitz, Samo Hromadka, Ahmed Touati, Diana Borsa, Maneesh Sahani

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the following sections, we experimentally validate the formal properties of the λR and explore its usefulness for solving RL problems with DMU. The majority of our experiments center on navigation tasks... (Section 5, page 5) and The λF-based experiments shown were run on a single NVIDIA Ge Force GTX 1080 GPU. (Appendix L, page 24).
Researcher Affiliation Collaboration Ted Moskovitz Gatsby Unit, UCL ted@gatsby.ucl.ac.uk Samo Hromadka Gatsby Unit, UCL samo.hromadka.21@ucl.ac.uk Ahmed Touati Meta atouati@meta.com Diana Borsa Google Deep Mind borsa@google.com Maneesh Sahani Gatsby Unit, UCL maneesh@gatsby.ucl.ac.uk
Pseudocode Yes Algorithm 1: Online Tabular Qλ-Learning Update (Appendix E.2, page 17) and Algorithm 2: Fitted Qλ-Iteration (Appendix E.4, page 19) and Algorithm 3: λO FB Learning (Appendix E.6, page 20).
Open Source Code No GIFs of navigation agents can be found at lambdarepresentation.github.io and in the supplementary material. (Appendix A, page 15). The website states 'Code coming soon!'.
Open Datasets Yes We applied them to the Two Rooms domain from the Neuro Nav benchmark task set [22] (Section 5.2, page 6) and the classic Four Rooms domain [24] (Section 5.3, page 6) and Mujoco continuous control tasks within Open AI Gym [43] (Appendix I, page 23).
Dataset Splits No The paper describes training and evaluation episodes and data collection, but does not explicitly specify training, validation, and test dataset splits with percentages, counts, or references to predefined splits for reproduction.
Hardware Specification Yes The λF-based experiments shown were run on a single NVIDIA Ge Force GTX 1080 GPU. (Appendix L, page 24) and All experiments in Section 6 and Appendix H were run on a single RTX5000 GPU (Appendix L, page 24).
Software Dependencies No The paper mentions software like 'Adam optimizer', 'BSuite library [36]', 'Soft Actor-Critic [SAC; 42] algorithm', and 'Open AI Gym [43]' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The discount factor γ was set to 0.9 for all experiments (Appendix E.1, page 17). Experiments used a constant step size α = 0.1. (Appendix E.2, page 18). Adam with a learning rate of 3e-4. (Appendix E.1, page 17). Table 2 (Appendix I, page 24) lists 'Hyperparameter Value' for SAC Mujoco experiments including 'Collection Steps 1000', 'Learning Rate 3 10 4', 'Batch Size 128', etc.