Reinforcement Learning with Non-Exponential Discounting
Authors: Matthias Schultheis, Constantin A. Rothkopf, Heinz Koeppl
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the applicability of our proposed approach on two simulated problems. |
| Researcher Affiliation | Academia | Matthias Schultheis Centre for Cognitive Science Technische Universität Darmstadt matthias.schultheis@tu-darmstadt.de Constantin A. Rothkopf Centre for Cognitive Science Technische Universität Darmstadt constantin.rothkopf@tu-darmstadt.de Heinz Koeppl Centre for Cognitive Science Technische Universität Darmstadt heinz.koeppl@tu-darmstadt.de |
| Pseudocode | Yes | Algorithm 1: Computation of the optimal value function and policy for non-exp. discounting (...) Algorithm 2: Computation of the gradient of F w.r.t. θ for inferring the discount function |
| Open Source Code | Yes | All proposed methods were implemented in Python using the PyTorch framework [71] and are available online1. (Footnote 1: https://git.rwth-aachen.de/bcs/nonexp-rl) |
| Open Datasets | No | The paper describes generating its own simulated data for experiments and does not refer to or provide access information for a publicly available or open dataset. For example: 'For generating data, we randomly sampled starting states and determined the time points at which subjects would switch their action. Afterward, the determined time points were distorted by Gaussian noise.' |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It describes generating simulated data but not how it was partitioned for different phases of model development or evaluation. |
| Hardware Specification | No | The paper mentions 'high-performance computer Lichtenberg at the NHR Centers NHR4CES at TU Darmstadt' in the acknowledgments, but does not provide specific hardware details such as CPU/GPU models, memory, or processor types. |
| Software Dependencies | No | The paper states 'All proposed methods were implemented in Python using the PyTorch framework [71]', but it does not specify the version numbers for Python or PyTorch, which are required for a reproducible description. |
| Experiment Setup | Yes | The used hyperparameters are listed in Appendix F. (Appendix F lists specific values for Number of hidden layers, Number of units per layer, Learning rate, Optimizer, Activation function, Number of samples for collocation points, Batch size, and Number of epochs for both simulated problems). |