Decoupling regularization from the action space
Authors: Sobhan Mohammadpour, Emma Frejinger, Pierre-Luc Bacon
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide three sets of experiments: a toy MDP where the number of actions is a parameter, a set of experiments on the Deep Mind Control suite (Tassa et al., 2018), and lastly, the drug design MDP of Bengio et al. (2021). |
| Researcher Affiliation | Academia | Anonymous authors Paper under double-blind review |
| Pseudocode | Yes | Algorithm 1 Decoupled SQL Algorithm 2 Soft actor-critic s update |
| Open Source Code | Yes | All code is hosted at https://anonymous.4open.science/r/decoupled_sql-5CAB/ and https://anonymous.4open.science/r/decoupled_gfn-8589. |
| Open Datasets | Yes | In this section, we provide three sets of experiments: a toy MDP where the number of actions is a parameter, a set of experiments on the Deep Mind Control suite (Tassa et al., 2018), and lastly, the drug design MDP of Bengio et al. (2021). |
| Dataset Splits | No | The paper does not explicitly state training/validation/test splits with percentages, counts, or specific citations for the datasets used. It refers to 'test rewards over the training' but lacks details on how the data was partitioned. |
| Hardware Specification | No | No specific hardware (e.g., GPU models, CPU models, memory details) used for running the experiments was mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned in the paper. |
| Experiment Setup | Yes | In the first experiment, we fix the temperature to 0.25. We chose α 0.77 to get similar results as Haarnoja et al. (2018) when the actions are in the [ 1, 1] range, this is our recommended default. |