reproducibilityindex.ai

Learning to Multi-Task by Active Sampling

Authors: Sahil Sharma*, Ashutosh Kumar Jha*, Parikshit S Hegde, Balaraman Ravindran

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate results in the Atari 2600 domain on seven multi-tasking instances: three 6-task instances, one 8-task instance, two 12-task instances and one 21-task instance. and 5 EXPERIMENTAL SETUP AND RESULTS
Researcher Affiliation	Academia	Sahil Sharma* Department of Computer Science and Engineering Indian Institute of Technology, Madras Ashutosh Kumar Jha* Department of Mechanical Engineering Indian Institute of Technology, Madras Parikshit S Hegde Department of Electrical Engineering Indian Institute of Technology, Madras Balaraman Ravindran Department of Computer Science and Engineering and Robert Bosch Centre for Data Science and AI (RBC-DSAI) Indian Institute of Technology, Madras
Pseudocode	Yes	APPENDIX C: TRAINING ALGORITHMS FOR OUR PROPOSED METHODS Algorithm 2 Baseline Multi-Task Learning Algorithm 3 A5C Algorithm 4 UA4C Algorithm 5 EA4C Algorithm 6 FA4C Algorithm 7 DUA4C
Open Source Code	No	The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository.
Open Datasets	Yes	We demonstrate results in the Atari 2600 domain... We evaluate the MTAs in our work on pam, qam, qgm, qhm. Table 1 reports the evaluation on qam. Evaluations on the other metrics have been reported in Appendix E. and games from Arcade Learning Environment (Marc G. Bellemare et al., 2013). and All the target scores in this work were taken from Table 4 of (Sharma et al., 2017).
Dataset Splits	Yes	Hyper-parameters for all multi-tasking algorithms in this work were tuned on only one MTI: MT1.
Hardware Specification	No	The paper mentions 'Amazon Web Services(AWS) Educate program for providing us with the computational resources for the experiment' and discusses the use of '16 parallel threads' or '20 parallel threads', but does not specify exact GPU or CPU models, memory details, or specific cloud instance types.
Software Dependencies	No	The paper mentions using 'LSTM version of the A3C algorithm' and 'async-rms-prop algorithm', but does not provide specific version numbers for any software dependencies or libraries used for implementation.
Experiment Setup	Yes	The initial learning rate was set to 10^-3 (found after hyper-parameter tuning over the set {7 * 10^-4, 10^-3}) and it was decayed linearly over the entire training period to a value of 10^-4. The value of n in the n-step returns used by A3C was set to 20. The discount factor γ for the discounted returns was set to be γ = 0.99. The hyper-parameter which trades-off optimizing for the entropy and the policy improvement is β... β = 0.02...