Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Decoupling regularization from the action space
Authors: Sobhan Mohammadpour, Emma Frejinger, Pierre-Luc Bacon
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide three sets of experiments: a toy MDP where the number of actions is a parameter, a set of experiments on the Deep Mind Control suite (Tassa et al., 2018), and lastly, the drug design MDP of Bengio et al. (2021). |
| Researcher Affiliation | Academia | Anonymous authors Paper under double-blind review |
| Pseudocode | Yes | Algorithm 1 Decoupled SQL Algorithm 2 Soft actor-critic s update |
| Open Source Code | Yes | All code is hosted at https://anonymous.4open.science/r/decoupled_sql-5CAB/ and https://anonymous.4open.science/r/decoupled_gfn-8589. |
| Open Datasets | Yes | In this section, we provide three sets of experiments: a toy MDP where the number of actions is a parameter, a set of experiments on the Deep Mind Control suite (Tassa et al., 2018), and lastly, the drug design MDP of Bengio et al. (2021). |
| Dataset Splits | No | The paper does not explicitly state training/validation/test splits with percentages, counts, or specific citations for the datasets used. It refers to 'test rewards over the training' but lacks details on how the data was partitioned. |
| Hardware Specification | No | No specific hardware (e.g., GPU models, CPU models, memory details) used for running the experiments was mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned in the paper. |
| Experiment Setup | Yes | In the first experiment, we fix the temperature to 0.25. We chose α 0.77 to get similar results as Haarnoja et al. (2018) when the actions are in the [ 1, 1] range, this is our recommended default. |