Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Decoupling regularization from the action space

Authors: Sobhan Mohammadpour, Emma Frejinger, Pierre-Luc Bacon

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide three sets of experiments: a toy MDP where the number of actions is a parameter, a set of experiments on the Deep Mind Control suite (Tassa et al., 2018), and lastly, the drug design MDP of Bengio et al. (2021).
Researcher Affiliation	Academia	Anonymous authors Paper under double-blind review
Pseudocode	Yes	Algorithm 1 Decoupled SQL Algorithm 2 Soft actor-critic s update
Open Source Code	Yes	All code is hosted at https://anonymous.4open.science/r/decoupled_sql-5CAB/ and https://anonymous.4open.science/r/decoupled_gfn-8589.
Open Datasets	Yes	In this section, we provide three sets of experiments: a toy MDP where the number of actions is a parameter, a set of experiments on the Deep Mind Control suite (Tassa et al., 2018), and lastly, the drug design MDP of Bengio et al. (2021).
Dataset Splits	No	The paper does not explicitly state training/validation/test splits with percentages, counts, or specific citations for the datasets used. It refers to 'test rewards over the training' but lacks details on how the data was partitioned.
Hardware Specification	No	No specific hardware (e.g., GPU models, CPU models, memory details) used for running the experiments was mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers were mentioned in the paper.
Experiment Setup	Yes	In the first experiment, we fix the temperature to 0.25. We chose α 0.77 to get similar results as Haarnoja et al. (2018) when the actions are in the [ 1, 1] range, this is our recommended default.