Direct Behavior Specification via Constrained Reinforcement Learning

Authors: Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, Chris J Pal

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games.
Researcher Affiliation Collaboration Julien Roy 1 2 Roger Girgis 1 2 Joshua Romoff 3 Pierre-Luc Bacon 1 4 5 Christopher Pal 1 2 4 6 7 [...] 1Institut d intelligence aritficielle du Qu ebec (Mila). 2 Ecole Polytechnique de Montr eal. 3Ubisoft La Forge. 4Universit e de Montr eal. 5Facebook CIFAR AI Chair. 6Service Now. 7Canada CIFAR AI Chair.
Pseudocode Yes Appendix A is titled "Algorithm" and presents "Algorithm 1 SAC-Lagrangian with Bootstrap Constraint" in pseudocode format.
Open Source Code Yes The code for the Arena environment experiments is available at: https://github.com/ubisoft/DirectBehaviorSpecification
Open Datasets No The paper uses custom environments ("Arena environment" and "Open World environment") and mentions a "Game RLand map generator," but does not provide concrete access information (e.g., URL, DOI, specific citation) for a publicly available dataset used for training. The provided code link is for their environment, not a general public dataset.
Dataset Splits No The paper describes evaluation procedures (e.g., "evaluated for 10 episodes" or "evaluated on 1000 episodes"), but it does not specify explicit train/validation/test dataset splits with percentages or counts, as is typical for static dataset-based experiments. Reinforcement learning experiments often involve continuous interaction with an environment rather than pre-split datasets.
Hardware Specification No The paper mentions training on GPUs in a general sense but does not provide specific details on CPU models, GPU models (e.g., NVIDIA A100, RTX 3090), memory, or cloud computing instance types used for the experiments.
Software Dependencies No The paper mentions using "SAC agents" (Soft Actor-Critic) and the "Adam optimizer" but does not specify the version numbers for these or any other software libraries, frameworks, or operating systems used.
Experiment Setup Yes Table 1, titled "HYPER-PARAMETERS FOR EXPERIMENTS IN THE ARENA ENVIRONMENT," and Table 2, titled "HYPER-PARAMETERS FOR EXPERIMENTS IN THE OPENWORLD ENVIRONMENT," explicitly list specific hyperparameter values such as discount factor, learning rates, batch sizes, buffer sizes, entropy coefficients, and constraint thresholds.