Rethinking Transformers in Solving POMDPs
Authors: Chenhao Lu, Ruizhe Shi, Yuyao Liu, Kaizhe Hu, Simon Shaolei Du, Huazhe Xu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper scrutinizes the effectiveness of a popular architecture, namely Transformers, in Partially Observable Markov Decision Processes (POMDPs) and reveals its theoretical and empirical limitations. [...] with empirical results highlighting the sub-optimal performance of Transformer and considerable strength of LRU. [...] Through extensive experiments across various tasks, We compare the capabilities exhibited by various sequence models across multiple dimensions. |
| Researcher Affiliation | Collaboration | 1IIIS, Tsinghua University 2University of Washington 3Shanghai Qi Zhi Institute 4Shanghai AI Lab. |
| Pseudocode | No | The paper includes figures illustrating network architectures but does not present pseudocode or explicitly labeled algorithm blocks. |
| Open Source Code | Yes | Our code is open-sourced1. [...] Our official code is released at https://github.com/CTP314/TFPORL. |
| Open Datasets | Yes | We conduct experiments on 8 partially observable environments, which are all Py Bullet locomotion control tasks with parts of the observations occluded (Ni et al., 2022) [...] Our experiments were conducted on the D4RL medium-expert dataset (Fu et al., 2020) of the aforementioned tasks |
| Dataset Splits | No | The paper mentions training and evaluation but does not specify the exact train/validation/test dataset splits by percentage or count. It refers to a "truncated time n for training and evaluation purposes" and that observations "come from the same distribution," but lacks precise split details. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments, such as particular GPU or CPU models, memory sizes, or types of computing clusters. |
| Software Dependencies | No | The paper mentions software components like GPT, LSTM, LRU, DQN, TD3, SACD, and PyTorch (implicitly through the codebase link), but it does not provide specific version numbers for any of these. |
| Experiment Setup | Yes | Comprehensive implementation details, task descriptions, and supplementary results are presented in Appendix D. We provide the configuration of hyperparameters in Table 3 and Table 4. [...] Table 3. Hyperparameters of different POMDP tasks. [...] Table 4. Hyperparameters of different RL algorithms. |