Structure-Aware Transformer Policy for Inhomogeneous Multi-Task Reinforcement Learning
Authors: Sunghoon Hong, Deunsol Yoon, Kee-Eung Kim
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run experiments on modular MTRL benchmarks (Huang et al., 2020; Wang et al., 2018), which are created based on Gym Mu Jo Co locomotion tasks. |
| Researcher Affiliation | Collaboration | Sunghoon Hong1,3, Deunsol Yoon1,3, Kee-Eung Kim1,2 1Kim Jaechul Graduate School of AI, KAIST, Daejeon, Republic of Korea 2School of Computing, KAIST, Daejeon, Republic of Korea 3LG AI Research, Seoul, Republic of Korea |
| Pseudocode | No | The paper describes methods through mathematical formulations and textual descriptions, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include a statement about open-sourcing the code or a link to a code repository. |
| Open Datasets | Yes | We run experiments on modular MTRL benchmarks (Huang et al., 2020; Wang et al., 2018), which are created based on Gym Mu Jo Co locomotion tasks. |
| Dataset Splits | No | The paper mentions 'train set' and 'test set' for environments in Table 2 and Appendix A.4, but does not provide explicit details on the percentages or counts for train/validation/test splits within these environments. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU/CPU models, memory, or cloud computing resources used for the experiments. |
| Software Dependencies | No | We implement SWAT based on AMORPHEUS which is built on Transformer Encoder from Py Torch, sharing the codebase of SMP. Additionally, we simply modify Transformer Encoder to incorporate with PE and RE, enabling relational embedding to be added per head. |
| Experiment Setup | Yes | Hyperparemeter Value Learning rate 0.0001 Gradient clipping 0.1 Normalization Layer Norm Attention layers 3 Attention heads 2 Attention hidden size 256 Encoder output size 128 Mini-batch size 100 Replay buffer size 500K Embedding size 128 Table 1: Hyperparameter setting in SWAT |