Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning to Modulate pre-trained Models in RL
Authors: Thomas Schmied, Markus Hofmarcher, Fabian Paischer, Razvan Pascanu, Sepp Hochreiter
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct an extensive evaluation of fine-tuning, parameter-efficient fine-tuning, and prompting methods for Transformers in RL. Then, we evaluate and compare a variety of fine-tuning methods prevalent in natural language processing, both in terms of performance on new tasks, and how well performance on pre-training tasks is retained. |
| Researcher Affiliation | Collaboration | 1 ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, 2 JKU LIT SAL e SPML Lab, Institute for Machine Learning, Johannes Kepler University, Linz, Austria 3 Google Deep Mind, 4 UCL |
| Pseudocode | No | The paper describes methods using text and mathematical formulas (e.g., Equation 1, 2, 3, 4) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code and datasets are available at: https://github.com/ml-jku/L2M |
| Open Datasets | Yes | Finally, to aid future research in this area, we release a dataset encompassing 50 Meta-World and 16 DMControl tasks. Source code and datasets are available at: https://github.com/ml-jku/L2M |
| Dataset Splits | No | The paper splits tasks into pre-training (MT40, DMC10) and fine-tuning (CW10, DMC6) sets, and describes data collection per task (e.g., '10K trajectories of length 200'), but does not specify train/validation/test dataset splits with percentages or sample counts for the collected data. |
| Hardware Specification | Yes | We run all our pre-training experiments on 4 NVIDIA A100 GPUs. For all our fine-tuning experiments, we use single GPU training on NVIDIA A100 or NVIDIA Titan V GPUs. |
| Software Dependencies | No | The paper mentions software like PyTorch, stable-baselines3, and the transformers library, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We train our MDDT for a total of 1M update steps, with context length of 5 transitions (45 tokens). We use a learning rate of 1e 4 and 4000 linear warm-up steps, followed by a cosine decay to 1e 6. Furthermore, we use gradient clip of 0.25, weight decay of 0.01, dropout of 0.2, a batch size of 1024 sequences and train using the Adam W optimizer (Loshchilov and Hutter, 2018). |