Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning

Authors: Changyu CHEN, Ramesha Karunasena, Thanh Nguyen, Arunesh Sinha, Pradeep Varakantham

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct extensive experiments to show the scalability of our approach compared to prior methods and the ability to enforce arbitrary state-conditional constraints on the support of the distribution of actions in any state. We evaluate IAR-A2C against prior works across a diverse set of environments, including lowdimensional discrete control tasks such as Cart Pole and Acrobot, the visually challenging Pistonball task with high-dimensional image inputs and an extremely large action space (upto 59, 049 categories), and an emergency resource allocation simulator in a city, referred to as Emergency Resource Allocation (ERA).
Researcher Affiliation Academia Singapore Management University1,University of Oregon2, Rutgers University3
Pseudocode Yes Algorithm 1: ELBO Optimization; Algorithm 2: IAR-A2C
Open Source Code Yes Our implementation is available at https://github.com/cameron-chen/flow-iar.
Open Datasets Yes We evaluate IAR-A2C against prior works across a diverse set of environments, including lowdimensional discrete control tasks such as Cart Pole and Acrobot [3], the visually challenging Pistonball task [31] with high-dimensional image inputs and an extremely large action space (upto 59, 049 categories), and an emergency resource allocation simulator in a city, referred to as Emergency Resource Allocation (ERA).
Dataset Splits No The paper does not provide specific details on how the datasets were split into training, validation, or test sets, such as percentages, absolute counts, or references to standard predefined splits.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory configurations.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow, or specific solvers).
Experiment Setup No The paper does not provide specific experimental setup details, such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific optimizer settings, beyond mentioning that approaches are trained with a certain number of seeds.