Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Authors: James MacGlashan, Evan Archer, Alisa Devlic, Takuma Seno, Craig Sherstan, Peter Wurman, Peter Stone

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We benchmark SAC-D, SAC-D-CAGrad and SAC-D-Naive against SAC on a selection of continuousaction Gym [7] environments. For each environment, we exposed existing additive reward components without altering the behavior of the environments or their composite rewards.
Researcher Affiliation Collaboration James Mac Glashan james.macglashan@sony.com Evan Archer evan.archer@sony.com Alisa Devlic alisa.devlic@sony.com Takuma Seno takuma.seno@sony.com Craig Sherstan craig.sherstan@sony.com Peter R. Wurman peter.wurman@sony.com Peter Stone pstone@cs.utexas.edu ... Sony AI Equal contribution The University of Texas at Austin
Pseudocode Yes Algorithm 1 SAC-D and SAC-D-CAGrad Update
Open Source Code No At present, we are unable to release our source code or data.
Open Datasets Yes We benchmark SAC-D, SAC-D-CAGrad and SAC-D-Naive against SAC on a selection of continuousaction Gym [7] environments.
Dataset Splits No The paper states: 'As outlined in App. C, we used hyperparameters previously published for use with SAC [14] for all experiments.' While it refers to hyperparameters and training, it does not explicitly provide specific train/validation/test dataset splits with percentages or sample counts in the main text.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. While the ethics checklist indicates that resources used are mentioned, these details are not found within the main paper content or appendices.
Software Dependencies No The paper mentions 'Gym [7]' and 'the rliable framework [2]', but it does not list specific version numbers for key software dependencies such as Python, PyTorch/TensorFlow, or other libraries used for implementation.
Experiment Setup No The paper states, 'As outlined in App. C, we used hyperparameters previously published for use with SAC [14] for all experiments.' This indicates that specific experimental setup details, such as hyperparameter values, are deferred to an appendix rather than being presented explicitly in the main text.