Reinforcement Learning with Simple Sequence Priors
Authors: Tankred Saanum, Noémi Éltető, Peter Dayan, Marcel Binz, Eric Schulz
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that the resulting RL algorithm leads to faster learning, and attains higher returns than state-of-the-art model-free approaches in a series of continuous control tasks from the Deep Mind Control Suite. These priors also produce a powerful informationregularized agent that is robust to noisy observations and can perform open-loop control. |
| Researcher Affiliation | Academia | Tankred Saanum1 Noémi Éltet o1 Peter Dayan1,2 Marcel Binz1 Eric Schulz1 1Max Planck Institute for Biological Cybernetics, 2University of Tübingen |
| Pseudocode | Yes | Algorithm 1 LZ4 pseudo-code |
| Open Source Code | Yes | Code: https://github.com/tankred-saanum/simple_priors |
| Open Datasets | Yes | We evaluated the agents described in Section 3 on eight continuous control tasks from the Deep Mind Control Suite [34]. |
| Dataset Splits | No | The paper describes training steps (e.g., '1 million environment steps') and evaluation episodes, but does not provide specific train/validation/test dataset splits in terms of percentages or counts for a fixed dataset, which is common in supervised learning but less so for reinforcement learning environments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Py Torch' and 'Adam optimizer', and refers to specific algorithms like 'LZ4'. However, it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Table 1: Hyperparameters used for SAC, MIRACLE, LZ-SAC, and SPAC. Table 2: Transformer hyperparameters. |