Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning
Authors: Changyu CHEN, Ramesha Karunasena, Thanh Nguyen, Arunesh Sinha, Pradeep Varakantham
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct extensive experiments to show the scalability of our approach compared to prior methods and the ability to enforce arbitrary state-conditional constraints on the support of the distribution of actions in any state. We evaluate IAR-A2C against prior works across a diverse set of environments, including lowdimensional discrete control tasks such as Cart Pole and Acrobot, the visually challenging Pistonball task with high-dimensional image inputs and an extremely large action space (upto 59, 049 categories), and an emergency resource allocation simulator in a city, referred to as Emergency Resource Allocation (ERA). |
| Researcher Affiliation | Academia | Singapore Management University1,University of Oregon2, Rutgers University3 |
| Pseudocode | Yes | Algorithm 1: ELBO Optimization; Algorithm 2: IAR-A2C |
| Open Source Code | Yes | Our implementation is available at https://github.com/cameron-chen/flow-iar. |
| Open Datasets | Yes | We evaluate IAR-A2C against prior works across a diverse set of environments, including lowdimensional discrete control tasks such as Cart Pole and Acrobot [3], the visually challenging Pistonball task [31] with high-dimensional image inputs and an extremely large action space (upto 59, 049 categories), and an emergency resource allocation simulator in a city, referred to as Emergency Resource Allocation (ERA). |
| Dataset Splits | No | The paper does not provide specific details on how the datasets were split into training, validation, or test sets, such as percentages, absolute counts, or references to standard predefined splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory configurations. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow, or specific solvers). |
| Experiment Setup | No | The paper does not provide specific experimental setup details, such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific optimizer settings, beyond mentioning that approaches are trained with a certain number of seeds. |