Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Discount Factor as a Regularizer in Reinforcement Learning
Authors: Ron Amit, Ron Meir, Kamil Ciosek
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically study this technique compared to standard L2 regularization by extensive experiments in discrete and continuous domains, using tabular and functional representations. Our experiments suggest the regularization effectiveness is strongly related to properties of the available data, such as size, distribution, and mixing rate. |
| Researcher Affiliation | Collaboration | 1The Viterbi Faculty of Electrical Engineering, Technion Israel Institute of Technology, Haifa, Israel 2Microsoft Research, Cambridge, UK. Correspondence to: Ron Amit <EMAIL>, Ron Meir <EMAIL>, Kamil Ciosek <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Generic Regularized Batch TD(0) |
| Open Source Code | Yes | Code for all the experiments is available at: https://github.com/ron-amit/Discount_as_ Regularizer. |
| Open Datasets | Yes | Our experiments use the Mujoco environment (Todorov et al., 2012). To test the ability to generalize from ๏ฌnite data, we limited the number of time-steps from the environment to 200,000 or less. |
| Dataset Splits | No | The paper mentions 'limited amount of training data' and 'finite data setting' and uses phrases like 'evaluation episodes' but does not specify explicit numerical training, validation, or test dataset splits (e.g., percentages or counts). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. It only mentions general settings like 'deep learning settings'. |
| Software Dependencies | No | The paper mentions algorithms used (e.g., TD3, DDPG, PPO, DQN, Adam) and environments (e.g., Mujoco), but it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | All hyper-parameters are identical to those suggested by (Fujimoto et al., 2018) except the following changes. We tested with several amounts of total time-steps to simulate a limited data setting. As in Fujimoto et al. (2018), The ๏ฌrst 104 time steps are used only for exploration. Another change to improve learning stability is increasing the batch size from 100 to 256. See Appendix A.7 for the complete implementation details. |