Discount Factor as a Regularizer in Reinforcement Learning
Authors: Ron Amit, Ron Meir, Kamil Ciosek
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically study this technique compared to standard L2 regularization by extensive experiments in discrete and continuous domains, using tabular and functional representations. Our experiments suggest the regularization effectiveness is strongly related to properties of the available data, such as size, distribution, and mixing rate. |
| Researcher Affiliation | Collaboration | 1The Viterbi Faculty of Electrical Engineering, Technion Israel Institute of Technology, Haifa, Israel 2Microsoft Research, Cambridge, UK. Correspondence to: Ron Amit <ronamit@campus.technion.ac.il>, Ron Meir <rmeir@ee.technion.ac.il>, Kamil Ciosek <Kamil.Ciosek@microsoft.com>. |
| Pseudocode | Yes | Algorithm 1 Generic Regularized Batch TD(0) |
| Open Source Code | Yes | Code for all the experiments is available at: https://github.com/ron-amit/Discount_as_ Regularizer. |
| Open Datasets | Yes | Our experiments use the Mujoco environment (Todorov et al., 2012). To test the ability to generalize from finite data, we limited the number of time-steps from the environment to 200,000 or less. |
| Dataset Splits | No | The paper mentions 'limited amount of training data' and 'finite data setting' and uses phrases like 'evaluation episodes' but does not specify explicit numerical training, validation, or test dataset splits (e.g., percentages or counts). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. It only mentions general settings like 'deep learning settings'. |
| Software Dependencies | No | The paper mentions algorithms used (e.g., TD3, DDPG, PPO, DQN, Adam) and environments (e.g., Mujoco), but it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | All hyper-parameters are identical to those suggested by (Fujimoto et al., 2018) except the following changes. We tested with several amounts of total time-steps to simulate a limited data setting. As in Fujimoto et al. (2018), The first 104 time steps are used only for exploration. Another change to improve learning stability is increasing the batch size from 100 to 256. See Appendix A.7 for the complete implementation details. |