Learning to Multi-Task by Active Sampling
Authors: Sahil Sharma*, Ashutosh Kumar Jha*, Parikshit S Hegde, Balaraman Ravindran
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate results in the Atari 2600 domain on seven multi-tasking instances: three 6-task instances, one 8-task instance, two 12-task instances and one 21-task instance. and 5 EXPERIMENTAL SETUP AND RESULTS |
| Researcher Affiliation | Academia | Sahil Sharma* Department of Computer Science and Engineering Indian Institute of Technology, Madras Ashutosh Kumar Jha* Department of Mechanical Engineering Indian Institute of Technology, Madras Parikshit S Hegde Department of Electrical Engineering Indian Institute of Technology, Madras Balaraman Ravindran Department of Computer Science and Engineering and Robert Bosch Centre for Data Science and AI (RBC-DSAI) Indian Institute of Technology, Madras |
| Pseudocode | Yes | APPENDIX C: TRAINING ALGORITHMS FOR OUR PROPOSED METHODS Algorithm 2 Baseline Multi-Task Learning Algorithm 3 A5C Algorithm 4 UA4C Algorithm 5 EA4C Algorithm 6 FA4C Algorithm 7 DUA4C |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository. |
| Open Datasets | Yes | We demonstrate results in the Atari 2600 domain... We evaluate the MTAs in our work on pam, qam, qgm, qhm. Table 1 reports the evaluation on qam. Evaluations on the other metrics have been reported in Appendix E. and games from Arcade Learning Environment (Marc G. Bellemare et al., 2013). and All the target scores in this work were taken from Table 4 of (Sharma et al., 2017). |
| Dataset Splits | Yes | Hyper-parameters for all multi-tasking algorithms in this work were tuned on only one MTI: MT1. |
| Hardware Specification | No | The paper mentions 'Amazon Web Services(AWS) Educate program for providing us with the computational resources for the experiment' and discusses the use of '16 parallel threads' or '20 parallel threads', but does not specify exact GPU or CPU models, memory details, or specific cloud instance types. |
| Software Dependencies | No | The paper mentions using 'LSTM version of the A3C algorithm' and 'async-rms-prop algorithm', but does not provide specific version numbers for any software dependencies or libraries used for implementation. |
| Experiment Setup | Yes | The initial learning rate was set to 10^-3 (found after hyper-parameter tuning over the set {7 * 10^-4, 10^-3}) and it was decayed linearly over the entire training period to a value of 10^-4. The value of n in the n-step returns used by A3C was set to 20. The discount factor γ for the discounted returns was set to be γ = 0.99. The hyper-parameter which trades-off optimizing for the entropy and the policy improvement is β... β = 0.02... |