Direct Behavior Specification via Constrained Reinforcement Learning
Authors: Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, Chris J Pal
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games. |
| Researcher Affiliation | Collaboration | Julien Roy 1 2 Roger Girgis 1 2 Joshua Romoff 3 Pierre-Luc Bacon 1 4 5 Christopher Pal 1 2 4 6 7 [...] 1Institut d intelligence aritficielle du Qu ebec (Mila). 2 Ecole Polytechnique de Montr eal. 3Ubisoft La Forge. 4Universit e de Montr eal. 5Facebook CIFAR AI Chair. 6Service Now. 7Canada CIFAR AI Chair. |
| Pseudocode | Yes | Appendix A is titled "Algorithm" and presents "Algorithm 1 SAC-Lagrangian with Bootstrap Constraint" in pseudocode format. |
| Open Source Code | Yes | The code for the Arena environment experiments is available at: https://github.com/ubisoft/DirectBehaviorSpecification |
| Open Datasets | No | The paper uses custom environments ("Arena environment" and "Open World environment") and mentions a "Game RLand map generator," but does not provide concrete access information (e.g., URL, DOI, specific citation) for a publicly available dataset used for training. The provided code link is for their environment, not a general public dataset. |
| Dataset Splits | No | The paper describes evaluation procedures (e.g., "evaluated for 10 episodes" or "evaluated on 1000 episodes"), but it does not specify explicit train/validation/test dataset splits with percentages or counts, as is typical for static dataset-based experiments. Reinforcement learning experiments often involve continuous interaction with an environment rather than pre-split datasets. |
| Hardware Specification | No | The paper mentions training on GPUs in a general sense but does not provide specific details on CPU models, GPU models (e.g., NVIDIA A100, RTX 3090), memory, or cloud computing instance types used for the experiments. |
| Software Dependencies | No | The paper mentions using "SAC agents" (Soft Actor-Critic) and the "Adam optimizer" but does not specify the version numbers for these or any other software libraries, frameworks, or operating systems used. |
| Experiment Setup | Yes | Table 1, titled "HYPER-PARAMETERS FOR EXPERIMENTS IN THE ARENA ENVIRONMENT," and Table 2, titled "HYPER-PARAMETERS FOR EXPERIMENTS IN THE OPENWORLD ENVIRONMENT," explicitly list specific hyperparameter values such as discount factor, learning rates, batch sizes, buffer sizes, entropy coefficients, and constraint thresholds. |