Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Local Temporal Difference Code for Distributional Reinforcement Learning
Authors: Pablo Tano, Peter Dayan, Alexandre Pouget
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Fig. 5d we compare flexibility to a horizon change between the Laplace code (black) and the Expectile code (red), whose estimates need to re-converge to the new value distribution at s under the new horizon T . We trained an ensemble of Laplace units to encode the same reward distribution and decoded the smoothed reward distribution assuming that the units actually code for expectiles. |
| Researcher Affiliation | Academia | Pablo Tano Basic Neurosciences University of Geneva Peter Dayan MPI for Biological Cybernetics University of Tรผbingen Alexandre Pouget Basic Neurosciences University of Geneva |
| Pseudocode | No | The paper describes algorithms using mathematical equations (e.g., Eq. 10, 16) but does not include structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing open-source code for the methodology, nor does it include links to a code repository. |
| Open Datasets | No | The paper uses abstract Markov Processes (MPs) for illustration and simulation (e.g., 'Consider the MP shown in Fig. 5a') but does not specify or provide access to any publicly available or open datasets for training or evaluation. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., exact CPU/GPU models, memory, or processor types) used for running its simulations or analyses. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names and versions) that would be needed to replicate the experimental setup. |
| Experiment Setup | No | The paper describes the mathematical rules for the Laplace code and how units converge, but it does not specify concrete hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or detailed training configurations for its simulations. |