Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Constructing an Optimal Behavior Basis for the Option Keyboard

Authors: Lucas N. Alegre, Ana Bazzan, Andre Barreto, Bruno Silva

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate our method in challenging high-dimensional RL problems and show that it consistently outperforms state-of-the-art GPI-based approaches. Importantly, we also observe that the performance gain over competing methods becomes more pronounced as the number of reward features increases. (Abstract)
Researcher Affiliation	Collaboration	Lucas N. Alegre Institute of Informatics Federal University of Rio Grande do Sul Porto Alegre, RS, Brazil EMAIL Ana L. C. Bazzan Institute of Informatics Federal University of Rio Grande do Sul Porto Alegre, RS, Brazil EMAIL André Barreto Google Deep Mind London, UK EMAIL Bruno C. da Silva University of Massachusetts Amherst, MA, USA EMAIL
Pseudocode	Yes	Algorithm 1: Option Keyboard Basis (OKB) [...] Algorithm 2: OK Linear Support (OK-LS) [...] Algorithm 3: Train Option Keyboard (Train OK)
Open Source Code	Yes	All the code required to reproduce our experiments is available in the Supplemental Material.
Open Datasets	Yes	Figure 1: Domains used in the experiments: Minecart, Fetch Pick And Place, Item Collection, and Highway. [...] We used the implementation available on MO-Gymnasium (Felten et al., 2023). [...] Our implementation of this domain is an adaptation of the one available in Gymnasium-Robotics (de Lazcano et al., 2023). [...] This domain is based on the autonomous driving environment introduced by Leurent (2018).
Dataset Splits	Yes	To generate test task sets W W for different values of d, we used the method introduced by Takagi et al. (2020), which produces uniformly spaced weight vectors in W.
Hardware Specification	Yes	All experiments were performed in a cluster with NVIDIA A100-PCIE-40GB GPUs with 32GB of RAM.
Software Dependencies	No	We used Adam (Kingma and Ba, 2015) as the first-order optimizer used to train all neural networks with mini-batches of size 256. [...] We used pycddlib (https://github.com/ mcmtroffaes/pycddlib) implementation of the Double Description Method (Motzkin et al., 1953).
Experiment Setup	Yes	The USFAs ψ(s, a, w) used for encoding the base policies Πk were modeled with multi-layer perceptron (MLP) neural networks with 4 layers with 256 neurons. [...] The meta-policy ω(s, w) was modeled with an MLP with 3 layers with 256 neurons. [...] We used Adam (Kingma and Ba, 2015) as the first-order optimizer used to train all neural networks with mini-batches of size 256. [...] The budget of environment interactions per iteration (i.e., call to New Policy in Alg. 1) used was 25000, 50000, 50000 and 100000 for the Minecart, Fetch Pick And Place, Item Collection, and Highway domains, respectively.