Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning

Authors: Subhojeet Pramanik, Esraa Elelimy, Marlos C. Machado, Adam White

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that both Ga Li Te and AGa Li Te can match the performance of more computationally expensive transformer architectures in a small diagnostic T-Maze environment. In a pixel-based navigation task, we find that our approach outperforms the state-of-the-art transformer architecture, GTr XL (Parisotto et al., 2020), by more than 37%. Our AGa Li Te-based agent achieves higher rewards than a GTr XL-based agent and higher performance across various in-game skills in Craftax Symbolic (Matthews et al., 2024), a symbolic adaptation of the 2D survival game Crafter (Hafner, 2021). In 3D pixel-based navigation tasks, AGa Li Te s performance is close to GTr XL while reducing the computation and memory by 40% and 50% respectively.
Researcher Affiliation Academia a Department of Computing Science, University of Alberta, Canada b Alberta Machine Intelligence Institute (Amii), Canada c Canada CIFAR AI Chair EMAIL
Pseudocode Yes Algorithm 1 Canonical Self-Attention Input: X RN d Parameters: WQ, WK, WV Rd dh 1: Q XWQ 2: K XWK 3: V XWV 4: A softmax( QK / d h )V Output: A RN dh ... Algorithm 3 Gated Linear Transformer (Ga Li Te) Self-Attention Input: xt Rd, Ct 1 Rdh ηdh, st 1 Rηdh Hyperparameters: η Parameters: WK, WQ, WV , Wβ, Wγ Rdh d and Wp1, Wp2, Wp3 Rη d 1: if t = 0 then 2: s0 0, C0 0. 3: end if {Calculate Key} 4: kt f(relu(Wp1xt) relu(WKxt)) {Calculate Query} 5: qt f(relu(Wp2xt) relu(WQxt)) {Calculate Value} 6: vt WV xt {Generate Gating Vectors} 7: βt σg(Wβxt) 8: γt f(σg(Wp3xt) σg(Wγxt)) {Update Memory} 9: Ct (1 βt) (1 γt) Ct 1 + βt vt γt kt 10: st (1 γt) st 1 + γt kt {Calculate Attention Vector} 11: at (Ctqt)/(stqt) Output: at Rdh, Ct Rdh ηdh, st Rηdh
Open Source Code Yes Code and implementation for this work is publicly available1. 1https://github.com/subho406/agalite
Open Datasets Yes In T-Maze (Bakker, 2001), the agent must remember a single cue signal. In Cart Pole, the agent must estimate the hidden state by integrating information over time. In Mystery Path (Pleines et al., 2023), the agent must remember multiple locations in a grid environment. In Craftax (Matthews et al., 2024), a 2D survival game, the agent faces with partial observability as it can only observe a limited portion of a large 2D map. In Memory Maze environment (Pašukonis et al., 2023), the agent must retain the layout of a 3D maze in addition to several locations across the maze.
Dataset Splits No The paper describes various reinforcement learning environments (T-Maze, Cart Pole, Mystery Path, Craftax, Memory Maze) and how agents interact with them. For these RL environments, the concept of a fixed training/test/validation *dataset split* in terms of pre-collected data files is not applicable as the data is generated dynamically through agent-environment interactions. The paper specifies environment configurations like 'Corridor Lengths 120-200' for T-Maze or 'maximum episode length of 128' for MPGrid, but not explicit dataset splits like percentages or sample counts for static data.
Hardware Specification Yes We collected all data in a single Google Cloud instance with NVIDIA A100 GPU, 12 CPUs and 80GB RAM.
Software Dependencies No We conducted all experiments using Python and implemented the agents using the Jax library (Bradbury et al. (2018)). We used the GTr XL implementation from the DIEngine library (engine Contributors, 2021). The paper mentions Python, Jax, and DIEngine but does not provide specific version numbers for these software dependencies, only citation years.
Experiment Setup Yes Table 3: Hyperparameters and sweeps for the T-Maze experiments. Hyperparameter Value Learning Rate [0.001, 0.0001 0.0005, 0.00001, 0.00005] Discount Factor (γ) 0.99 Advantage Estimation Coefficient (λ) 0.95 Entropy Coefficient [0.1, 0.01, 0.001, 0.0001, 0.00001] Value Loss Coefficient 0.5 Rollout Len 256 Num of Envs 8 Batch Size (Rollout Len Num of Envs) 2048 Actor Layer Dimension 128 Critic Layer Dimension 128 ... Tables 5, 6, 7, and 9 provide similar detailed hyperparameter tables for Cart Pole, Mystery Path, Memory Maze, and Craftax experiments, respectively.