Value Function Decomposition for Iterative Design of Reinforcement Learning Agents
Authors: James MacGlashan, Evan Archer, Alisa Devlic, Takuma Seno, Craig Sherstan, Peter Wurman, Peter Stone
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark SAC-D, SAC-D-CAGrad and SAC-D-Naive against SAC on a selection of continuousaction Gym [7] environments. For each environment, we exposed existing additive reward components without altering the behavior of the environments or their composite rewards. |
| Researcher Affiliation | Collaboration | James Mac Glashan james.macglashan@sony.com Evan Archer evan.archer@sony.com Alisa Devlic alisa.devlic@sony.com Takuma Seno takuma.seno@sony.com Craig Sherstan craig.sherstan@sony.com Peter R. Wurman peter.wurman@sony.com Peter Stone pstone@cs.utexas.edu ... Sony AI Equal contribution The University of Texas at Austin |
| Pseudocode | Yes | Algorithm 1 SAC-D and SAC-D-CAGrad Update |
| Open Source Code | No | At present, we are unable to release our source code or data. |
| Open Datasets | Yes | We benchmark SAC-D, SAC-D-CAGrad and SAC-D-Naive against SAC on a selection of continuousaction Gym [7] environments. |
| Dataset Splits | No | The paper states: 'As outlined in App. C, we used hyperparameters previously published for use with SAC [14] for all experiments.' While it refers to hyperparameters and training, it does not explicitly provide specific train/validation/test dataset splits with percentages or sample counts in the main text. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. While the ethics checklist indicates that resources used are mentioned, these details are not found within the main paper content or appendices. |
| Software Dependencies | No | The paper mentions 'Gym [7]' and 'the rliable framework [2]', but it does not list specific version numbers for key software dependencies such as Python, PyTorch/TensorFlow, or other libraries used for implementation. |
| Experiment Setup | No | The paper states, 'As outlined in App. C, we used hyperparameters previously published for use with SAC [14] for all experiments.' This indicates that specific experimental setup details, such as hyperparameter values, are deferred to an appendix rather than being presented explicitly in the main text. |