Structure-Aware Transformer Policy for Inhomogeneous Multi-Task Reinforcement Learning

Authors: Sunghoon Hong, Deunsol Yoon, Kee-Eung Kim

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run experiments on modular MTRL benchmarks (Huang et al., 2020; Wang et al., 2018), which are created based on Gym Mu Jo Co locomotion tasks.
Researcher Affiliation Collaboration Sunghoon Hong1,3, Deunsol Yoon1,3, Kee-Eung Kim1,2 1Kim Jaechul Graduate School of AI, KAIST, Daejeon, Republic of Korea 2School of Computing, KAIST, Daejeon, Republic of Korea 3LG AI Research, Seoul, Republic of Korea
Pseudocode No The paper describes methods through mathematical formulations and textual descriptions, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not include a statement about open-sourcing the code or a link to a code repository.
Open Datasets Yes We run experiments on modular MTRL benchmarks (Huang et al., 2020; Wang et al., 2018), which are created based on Gym Mu Jo Co locomotion tasks.
Dataset Splits No The paper mentions 'train set' and 'test set' for environments in Table 2 and Appendix A.4, but does not provide explicit details on the percentages or counts for train/validation/test splits within these environments.
Hardware Specification No The paper does not specify any hardware details such as GPU/CPU models, memory, or cloud computing resources used for the experiments.
Software Dependencies No We implement SWAT based on AMORPHEUS which is built on Transformer Encoder from Py Torch, sharing the codebase of SMP. Additionally, we simply modify Transformer Encoder to incorporate with PE and RE, enabling relational embedding to be added per head.
Experiment Setup Yes Hyperparemeter Value Learning rate 0.0001 Gradient clipping 0.1 Normalization Layer Norm Attention layers 3 Attention heads 2 Attention hidden size 256 Encoder output size 128 Mini-batch size 100 Replay buffer size 500K Embedding size 128 Table 1: Hyperparameter setting in SWAT