Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Meta-learning how to Share Credit among Macro-Actions

Authors: Ionel-Alexandru Hosu, Traian Rebedea, Razvan Pascanu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our strategy looking at macro-actions in Atari games, and the Street Fighter II environment. Our results show significant improvements over the RAINBOWDQN baseline in all environments.
Researcher Affiliation Collaboration Ionel-Alexandru Hosu Politehnica University of Bucharest Bucharest, Romania EMAIL Traian Rebedea NVIDIA & Politehnica University of Bucharest Bucharest, Romania EMAIL Razvan Pascanu Mila Quebec Artificial Intelligence Institute Montreal, Canada EMAIL
Pseudocode Yes Algorithm 1 Meta-learning Credit Assignment with MASP
Open Source Code Yes To facilitate reproducibility, all code used for the experiments and MASP implementation is available at: https://github.com/rl-submissions/macro-credit-masp.
Open Datasets Yes including a suite of Atari 2600 games from the Arcade Learning Environment (ALE) [34] and the complex, structured action environment of Street Fighter II from Gym Retro [35]. These benchmarks represent scenarios with varied complexity and exploration demands, making them ideal for assessing the benefits of our approach. [...] To construct meaningful macro-actions in Atari, we leverage the Atari Grand Challenge Dataset [36], which provides human expert trajectories.
Dataset Splits No The paper describes training on '2B frames in Atari' and '500 million frames in Street Fighter II' and mentions 'validation sweeps', but does not provide explicit training/test/validation dataset splits (e.g., percentages or sample counts) in the conventional supervised learning sense. For RL, evaluation is performed on the policy in the environment rather than on a static dataset split.
Hardware Specification Yes Our experiments were conducted on a compute cluster equipped with 192GB RAM and a combination of NVIDIA GPUs: 1x RTX 5090, 2x RTX 4090, and 4x RTX 3090 cards.
Software Dependencies No No specific software versions for key dependencies (e.g., Python, PyTorch, TensorFlow) are provided. The paper references algorithms like RAINBOW DQN [11] and environments like Gym Retro [35] and ALE [34], but not their specific software versions.
Experiment Setup Yes We summarize Atari preprocessing settings in Table 6 and the main algorithm hyperparameters in Table 7. [...] Streetfighter II experiments were conducted with domain-specific preprocessing and macro-action settings, including a reduced action set and macro-actions corresponding to common combos. All environment and training hyperparameters are detailed in Table 9. [...] Table 11: Mini Grid-specific hyperparameters. Only hyperparameters that differ from Atari are shown.