Sharing Experience in Multitask Reinforcement Learning
Authors: Tung-Long Vuong, Do-Van Nguyen, Tai-Long Nguyen, Cong-Minh Bui, Hai-Dang Kieu, Viet-Cuong Ta, Quoc-Long Tran, Thanh-Ha Le
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments highlight that our framework improves the performance and the stability of learning task-policies, and is possible to help task-policies avoid local optimums. |
| Researcher Affiliation | Academia | HMI Lab, UET, Vietnam National University, Hanoi, Vietnam |
| Pseudocode | Yes | Algorithm 1 Sharing-experience framework with Z agent |
| Open Source Code | No | The paper does not provide any explicit statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper uses custom-designed 'multi-task gridworld environments' (Section 4) described as 'Multiple Goals Well-Gated Grid-World' and 'Multiple Goals Grid-World without Wall'. No links, DOIs, repositories, or formal citations to publicly available datasets are provided. |
| Dataset Splits | No | The paper describes experiments in custom gridworld environments where agents interact directly. It lists 'Number of rollouts each Inter.' and 'Rollout length' in Table 1, which are parameters for data generation in reinforcement learning, but it does not specify traditional training/validation/test dataset splits with percentages or sample counts for reproduction. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory, or cloud instance specifications). |
| Software Dependencies | No | The paper mentions various algorithms and architectures like 'Q-learning', 'SARSA', 'CNN', 'LSTM', 'DNN', 'DRL', 'PPO algorithm', and 'basic advantage-actor-critic'. However, it does not provide specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9', 'CUDA 11.1') needed for replication. |
| Experiment Setup | Yes | The paper provides a detailed experimental setup in 'Table 1: Hyper-parameters Setting' which lists parameters such as 'Discounted factor 0.99', 'Number of Iterations 1000', 'Learning rate actor-critic 0.005', 'Optimizer RMSProp', and 'Table 2: Network structures' which specifies the architecture for Policy, Value, and Z networks including layer sizes. |