reproducibilityindex.ai

Structure-Aware Transformer Policy for Inhomogeneous Multi-Task Reinforcement Learning

Authors: Sunghoon Hong, Deunsol Yoon, Kee-Eung Kim

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run experiments on modular MTRL benchmarks (Huang et al., 2020; Wang et al., 2018), which are created based on Gym Mu Jo Co locomotion tasks.
Researcher Affiliation	Collaboration	Sunghoon Hong1,3, Deunsol Yoon1,3, Kee-Eung Kim1,2 1Kim Jaechul Graduate School of AI, KAIST, Daejeon, Republic of Korea 2School of Computing, KAIST, Daejeon, Republic of Korea 3LG AI Research, Seoul, Republic of Korea
Pseudocode	No	The paper describes methods through mathematical formulations and textual descriptions, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include a statement about open-sourcing the code or a link to a code repository.
Open Datasets	Yes	We run experiments on modular MTRL benchmarks (Huang et al., 2020; Wang et al., 2018), which are created based on Gym Mu Jo Co locomotion tasks.
Dataset Splits	No	The paper mentions 'train set' and 'test set' for environments in Table 2 and Appendix A.4, but does not provide explicit details on the percentages or counts for train/validation/test splits within these environments.
Hardware Specification	No	The paper does not specify any hardware details such as GPU/CPU models, memory, or cloud computing resources used for the experiments.
Software Dependencies	No	We implement SWAT based on AMORPHEUS which is built on Transformer Encoder from Py Torch, sharing the codebase of SMP. Additionally, we simply modify Transformer Encoder to incorporate with PE and RE, enabling relational embedding to be added per head.
Experiment Setup	Yes	Hyperparemeter Value Learning rate 0.0001 Gradient clipping 0.1 Normalization Layer Norm Attention layers 3 Attention heads 2 Attention hidden size 256 Encoder output size 128 Mini-batch size 100 Replay buffer size 500K Embedding size 128 Table 1: Hyperparameter setting in SWAT