reproducibilityindex.ai

Discount Factor as a Regularizer in Reinforcement Learning

Authors: Ron Amit, Ron Meir, Kamil Ciosek

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically study this technique compared to standard L2 regularization by extensive experiments in discrete and continuous domains, using tabular and functional representations. Our experiments suggest the regularization effectiveness is strongly related to properties of the available data, such as size, distribution, and mixing rate.
Researcher Affiliation	Collaboration	1The Viterbi Faculty of Electrical Engineering, Technion Israel Institute of Technology, Haifa, Israel 2Microsoft Research, Cambridge, UK. Correspondence to: Ron Amit <ronamit@campus.technion.ac.il>, Ron Meir <rmeir@ee.technion.ac.il>, Kamil Ciosek <Kamil.Ciosek@microsoft.com>.
Pseudocode	Yes	Algorithm 1 Generic Regularized Batch TD(0)
Open Source Code	Yes	Code for all the experiments is available at: https://github.com/ron-amit/Discount_as_ Regularizer.
Open Datasets	Yes	Our experiments use the Mujoco environment (Todorov et al., 2012). To test the ability to generalize from ﬁnite data, we limited the number of time-steps from the environment to 200,000 or less.
Dataset Splits	No	The paper mentions 'limited amount of training data' and 'finite data setting' and uses phrases like 'evaluation episodes' but does not specify explicit numerical training, validation, or test dataset splits (e.g., percentages or counts).
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. It only mentions general settings like 'deep learning settings'.
Software Dependencies	No	The paper mentions algorithms used (e.g., TD3, DDPG, PPO, DQN, Adam) and environments (e.g., Mujoco), but it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x).
Experiment Setup	Yes	All hyper-parameters are identical to those suggested by (Fujimoto et al., 2018) except the following changes. We tested with several amounts of total time-steps to simulate a limited data setting. As in Fujimoto et al. (2018), The ﬁrst 104 time steps are used only for exploration. Another change to improve learning stability is increasing the batch size from 100 to 256. See Appendix A.7 for the complete implementation details.