Rethinking Transformers in Solving POMDPs

Authors: Chenhao Lu, Ruizhe Shi, Yuyao Liu, Kaizhe Hu, Simon Shaolei Du, Huazhe Xu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper scrutinizes the effectiveness of a popular architecture, namely Transformers, in Partially Observable Markov Decision Processes (POMDPs) and reveals its theoretical and empirical limitations. [...] with empirical results highlighting the sub-optimal performance of Transformer and considerable strength of LRU. [...] Through extensive experiments across various tasks, We compare the capabilities exhibited by various sequence models across multiple dimensions.
Researcher Affiliation Collaboration 1IIIS, Tsinghua University 2University of Washington 3Shanghai Qi Zhi Institute 4Shanghai AI Lab.
Pseudocode No The paper includes figures illustrating network architectures but does not present pseudocode or explicitly labeled algorithm blocks.
Open Source Code Yes Our code is open-sourced1. [...] Our official code is released at https://github.com/CTP314/TFPORL.
Open Datasets Yes We conduct experiments on 8 partially observable environments, which are all Py Bullet locomotion control tasks with parts of the observations occluded (Ni et al., 2022) [...] Our experiments were conducted on the D4RL medium-expert dataset (Fu et al., 2020) of the aforementioned tasks
Dataset Splits No The paper mentions training and evaluation but does not specify the exact train/validation/test dataset splits by percentage or count. It refers to a "truncated time n for training and evaluation purposes" and that observations "come from the same distribution," but lacks precise split details.
Hardware Specification No The paper does not specify the hardware used for experiments, such as particular GPU or CPU models, memory sizes, or types of computing clusters.
Software Dependencies No The paper mentions software components like GPT, LSTM, LRU, DQN, TD3, SACD, and PyTorch (implicitly through the codebase link), but it does not provide specific version numbers for any of these.
Experiment Setup Yes Comprehensive implementation details, task descriptions, and supplementary results are presented in Appendix D. We provide the configuration of hyperparameters in Table 3 and Table 4. [...] Table 3. Hyperparameters of different POMDP tasks. [...] Table 4. Hyperparameters of different RL algorithms.