Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Direct Behavior Specification via Constrained Reinforcement Learning
Authors: Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, Chris J Pal
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games. |
| Researcher Affiliation | Collaboration | Julien Roy 1 2 Roger Girgis 1 2 Joshua Romoff 3 Pierre-Luc Bacon 1 4 5 Christopher Pal 1 2 4 6 7 [...] 1Institut d intelligence aritficielle du Qu ebec (Mila). 2 Ecole Polytechnique de Montr eal. 3Ubisoft La Forge. 4Universit e de Montr eal. 5Facebook CIFAR AI Chair. 6Service Now. 7Canada CIFAR AI Chair. |
| Pseudocode | Yes | Appendix A is titled "Algorithm" and presents "Algorithm 1 SAC-Lagrangian with Bootstrap Constraint" in pseudocode format. |
| Open Source Code | Yes | The code for the Arena environment experiments is available at: https://github.com/ubisoft/DirectBehaviorSpecification |
| Open Datasets | No | The paper uses custom environments ("Arena environment" and "Open World environment") and mentions a "Game RLand map generator," but does not provide concrete access information (e.g., URL, DOI, specific citation) for a publicly available dataset used for training. The provided code link is for their environment, not a general public dataset. |
| Dataset Splits | No | The paper describes evaluation procedures (e.g., "evaluated for 10 episodes" or "evaluated on 1000 episodes"), but it does not specify explicit train/validation/test dataset splits with percentages or counts, as is typical for static dataset-based experiments. Reinforcement learning experiments often involve continuous interaction with an environment rather than pre-split datasets. |
| Hardware Specification | No | The paper mentions training on GPUs in a general sense but does not provide specific details on CPU models, GPU models (e.g., NVIDIA A100, RTX 3090), memory, or cloud computing instance types used for the experiments. |
| Software Dependencies | No | The paper mentions using "SAC agents" (Soft Actor-Critic) and the "Adam optimizer" but does not specify the version numbers for these or any other software libraries, frameworks, or operating systems used. |
| Experiment Setup | Yes | Table 1, titled "HYPER-PARAMETERS FOR EXPERIMENTS IN THE ARENA ENVIRONMENT," and Table 2, titled "HYPER-PARAMETERS FOR EXPERIMENTS IN THE OPENWORLD ENVIRONMENT," explicitly list specific hyperparameter values such as discount factor, learning rates, batch sizes, buffer sizes, entropy coefficients, and constraint thresholds. |