Sharing Experience in Multitask Reinforcement Learning

Authors: Tung-Long Vuong, Do-Van Nguyen, Tai-Long Nguyen, Cong-Minh Bui, Hai-Dang Kieu, Viet-Cuong Ta, Quoc-Long Tran, Thanh-Ha Le

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments highlight that our framework improves the performance and the stability of learning task-policies, and is possible to help task-policies avoid local optimums.
Researcher Affiliation Academia HMI Lab, UET, Vietnam National University, Hanoi, Vietnam
Pseudocode Yes Algorithm 1 Sharing-experience framework with Z agent
Open Source Code No The paper does not provide any explicit statement or link regarding the availability of open-source code for the described methodology.
Open Datasets No The paper uses custom-designed 'multi-task gridworld environments' (Section 4) described as 'Multiple Goals Well-Gated Grid-World' and 'Multiple Goals Grid-World without Wall'. No links, DOIs, repositories, or formal citations to publicly available datasets are provided.
Dataset Splits No The paper describes experiments in custom gridworld environments where agents interact directly. It lists 'Number of rollouts each Inter.' and 'Rollout length' in Table 1, which are parameters for data generation in reinforcement learning, but it does not specify traditional training/validation/test dataset splits with percentages or sample counts for reproduction.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory, or cloud instance specifications).
Software Dependencies No The paper mentions various algorithms and architectures like 'Q-learning', 'SARSA', 'CNN', 'LSTM', 'DNN', 'DRL', 'PPO algorithm', and 'basic advantage-actor-critic'. However, it does not provide specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9', 'CUDA 11.1') needed for replication.
Experiment Setup Yes The paper provides a detailed experimental setup in 'Table 1: Hyper-parameters Setting' which lists parameters such as 'Discounted factor 0.99', 'Number of Iterations 1000', 'Learning rate actor-critic 0.005', 'Optimizer RMSProp', and 'Table 2: Network structures' which specifies the architecture for Policy, Value, and Z networks including layer sizes.