Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Iterative Foundation Model Fine-Tuning on Multiple Rewards

Authors: Pouya M. Ghari, simone sciabola, Ye Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results across diverse domains including text, biological sequence, and small molecule generation, demonstrate the effectiveness of the proposed algorithm compared to state-of-the-art baselines.
Researcher Affiliation	Industry	Pouya M. Ghari Biogen Simone Sciabola Biogen Ye Wang Biogen Corresponding Author: EMAIL
Pseudocode	Yes	Algorithm 1 Iterative RS: Iterative Multi-Objective Model Fine-Tuning 1: Input: Reference policy πref, learning rate η, merge frequency m. 2: Initialize πθi,1, i {1, . . . , N} as πref; S0 by sampling M objectives uniformly. 3: for t = 1, . . . , T do 4: Set St = St 1 5: if t mod m = 0 then 6: Merge policy weights {θi,t}N i=1 to obtain the shared parameter ρt as in equation 9. 7: Sample uniformly at random M objectives to update St. 8: end if 9: For any objective i St, update the policy parameter θi,t as in equation 8. 10: end for 11: Merge all policy weights {θi,T }N i=1 to obtain the shared parameter ρT . 12: Output: Policy πρT .
Open Source Code	Yes	Codes are available at https://github.com/ pouyamghari/Iterative RS.
Open Datasets	Yes	The goal of this task is to generate small molecules that exhibit specific desirable energy properties...A GPT-2 model is pre-trained on SMILES representations of 2 million molecules from the MOSES dataset [40], resulting in a model referred to as Mol GPT-2. This pre-trained model is then fine-tuned on the QM9 dataset [6, 45] to optimize for multiple objectives. ...a GPT-2 model referred to as DNAGPT-2 is pre-trained on approximately 700,000 unlabeled DNA sequences, each 200 base pairs long, from the MPRA dataset [18]... ...we use Llama-3.2-3B-Instruct as the base model. This foundation model is fine-tuned on the Reddit Summary dataset [49] for the post summarization task.
Dataset Splits	Yes	The dataset was split into 80% training, 10% validation, and 10% test sets.
Hardware Specification	Yes	Model training was conducted using four V100 GPUs.
Software Dependencies	No	To fine-tune the Mol GPT-2 model using MORLHF, RS, and Iterative RS, we employed PPO from the TRL library.
Experiment Setup	Yes	All models were fine-tuned with a learning rate of 1.41 10 5 using the Adam optimizer and a batch size of 128. ...the number of optimization steps is set to T = 100. ...For Iterative RS, the merging frequency is set to m = 4.