Decoupling regularization from the action space

Authors: Sobhan Mohammadpour, Emma Frejinger, Pierre-Luc Bacon

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide three sets of experiments: a toy MDP where the number of actions is a parameter, a set of experiments on the Deep Mind Control suite (Tassa et al., 2018), and lastly, the drug design MDP of Bengio et al. (2021).
Researcher Affiliation Academia Anonymous authors Paper under double-blind review
Pseudocode Yes Algorithm 1 Decoupled SQL Algorithm 2 Soft actor-critic s update
Open Source Code Yes All code is hosted at https://anonymous.4open.science/r/decoupled_sql-5CAB/ and https://anonymous.4open.science/r/decoupled_gfn-8589.
Open Datasets Yes In this section, we provide three sets of experiments: a toy MDP where the number of actions is a parameter, a set of experiments on the Deep Mind Control suite (Tassa et al., 2018), and lastly, the drug design MDP of Bengio et al. (2021).
Dataset Splits No The paper does not explicitly state training/validation/test splits with percentages, counts, or specific citations for the datasets used. It refers to 'test rewards over the training' but lacks details on how the data was partitioned.
Hardware Specification No No specific hardware (e.g., GPU models, CPU models, memory details) used for running the experiments was mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers were mentioned in the paper.
Experiment Setup Yes In the first experiment, we fix the temperature to 0.25. We chose α 0.77 to get similar results as Haarnoja et al. (2018) when the actions are in the [ 1, 1] range, this is our recommended default.