Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reinforcement Learning with Simple Sequence Priors
Authors: Tankred Saanum, Noémi Éltető, Peter Dayan, Marcel Binz, Eric Schulz
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that the resulting RL algorithm leads to faster learning, and attains higher returns than state-of-the-art model-free approaches in a series of continuous control tasks from the Deep Mind Control Suite. These priors also produce a powerful informationregularized agent that is robust to noisy observations and can perform open-loop control. |
| Researcher Affiliation | Academia | Tankred Saanum1 Noémi Éltet o1 Peter Dayan1,2 Marcel Binz1 Eric Schulz1 1Max Planck Institute for Biological Cybernetics, 2University of Tübingen |
| Pseudocode | Yes | Algorithm 1 LZ4 pseudo-code |
| Open Source Code | Yes | Code: https://github.com/tankred-saanum/simple_priors |
| Open Datasets | Yes | We evaluated the agents described in Section 3 on eight continuous control tasks from the Deep Mind Control Suite [34]. |
| Dataset Splits | No | The paper describes training steps (e.g., '1 million environment steps') and evaluation episodes, but does not provide specific train/validation/test dataset splits in terms of percentages or counts for a fixed dataset, which is common in supervised learning but less so for reinforcement learning environments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Py Torch' and 'Adam optimizer', and refers to specific algorithms like 'LZ4'. However, it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Table 1: Hyperparameters used for SAC, MIRACLE, LZ-SAC, and SPAC. Table 2: Transformer hyperparameters. |