Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Emergence of In-Context Reinforcement Learning from Noise Distillation

Authors: Ilya Zisman, Vladislav Kurenkov, Alexander Nikulin, Viacheslav Sinii, Sergey Kolesnikov

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that it is viable to construct a synthetic noise injection curriculum which helps to obtain learning histories. Moreover, we experimentally demonstrate that it is possible to alleviate the need for generation using optimal policies, with in-context RL still able to outperform the best suboptimal policy in a learning dataset by a 2x margin.
Researcher Affiliation	Collaboration	1AIRI, Moscow, Russia 2Skoltech, Moscow, Russia 3Innopolis University, Kazan, Russia 4MIPT, Moscow, Russia 5Tinkoff, Moscow, Russia. *Work done while at Tinkoff
Pseudocode	Yes	Algorithm 1 Data Generation
Open Source Code	Yes	Our implementation is available at https://github.com/ corl-team/ad-eps
Open Datasets	No	The paper describes generating data within custom environments (Dark Room, Key-to-Door, Watermaze) and does not provide concrete access information (link, DOI, formal citation) for a publicly available or open dataset used for training.
Dataset Splits	Yes	In total, there are 81 goals, of which we use 65 for training and 16 for evaluation. We employ Gtrain, Geval, Gtest task split, Gtrain, Geval during the pre-training phase to select the best model, Gtest during evaluation.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions software packages like CORL package, stable-baselines3, DMLab, and Shimmy package, but does not provide specific version numbers for these software dependencies required for replication.
Experiment Setup	Yes	The exact hyperparameters of the model can be found in Appendix F. We compute the decay rate by regulating how many histories (full trajectories until termination) are generated for a single goal. The exact number of histories is reported in Appendix G.