reproducibilityindex.ai

In-Context Reinforcement Learning for Variable Action Spaces

Authors: Viacheslav Sinii, Alexander Nikulin, Vladislav Kurenkov, Ilya Zisman, Sergey Kolesnikov

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments using Bernoulli and contextual bandits, and a darkroom environment with changing action spaces, we demonstrate that Headless-AD is capable of matching the performance of the original data generation algorithm and scaling to action spaces up to 5x larger than those seen during training. We also observed that Headless AD can even outperform AD when they are both trained for the same action space, especially when evaluated on larger action sets.
Researcher Affiliation	Collaboration	1Tinkoff, Moscow, Russia 2Innopolis University 3AIRI, Moscow, Russia 4MIPT.
Pseudocode	Yes	Listing 1: Code that demonstrates the Headless-AD training procedure. Note that this snippet is intended for illustration purposes only. The complete code can be found in Headless-AD s repository.
Open Source Code	Yes	Implementation is available at: https://github.com/corl-team/headless-ad.
Open Datasets	No	The paper mentions environments like Bernoulli Bandit, Contextual Bandit, and Darkroom, and describes how data was generated (e.g., using Thompson Sampling or Lin UCB). However, it does not provide concrete access information (like a URL or formal citation) to these datasets if they are considered external or publicly available. The data is generated for the experiments, not sourced from a pre-existing public dataset with explicit access details.
Dataset Splits	No	The paper mentions 'The training dataset consisted of bandits with 4–20 arms', and discusses training and test distributions, but does not explicitly provide percentages or counts for training, validation, and test splits for reproducibility.
Hardware Specification	Yes	All experiments were performed on A100 GPUs.
Software Dependencies	Yes	To sample the orthonormal vectors used as action embeddings, we use the torch.nn.init.orthogonal function from PyTorch (Paszke et al., 2019)
Experiment Setup	Yes	In our experiments, we used the Tiny LLaMA (Zhang et al., 2024) implementation of the transformer model and AdamW optimizer (Loshchilov & Hutter, 2017). All environment specific hyperparameters are listed in Appendix J. Table 3. Headless-AD s Environment-Specific Hyperparameters: For certain instances, hyperparameters underwent optimization within the specified ranges in the Sweep Values column, utilizing the Bayesian search method facilitated by the wandb sweep tool (Biewald, 2020).