Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
Authors: Yanggan Gu, Yuanyi Wang, Zhaoyi Yan, Yiming Zhang, Qi Zhou, Fei Wu, Hongxia Yang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on 11 widely-used benchmarks demonstrate that Infi FPO consistently outperforms existing model fusion and preference optimization methods. When using Phi-4 as the pivot model, Infi FPO improve its average performance from 79.95 to 83.33 on 11 benchmarks, significantly improving its capabilities in mathematics, coding, and reasoning tasks. |
| Researcher Affiliation | Collaboration | 1The Hong Kong Polytechnic University 2Infi X.ai 3Zhejiang University |
| Pseudocode | No | The paper describes its methodology through mathematical derivations and textual explanations of its components and strategies (Fuse RLHF, FPO objective, Length Normalization, Probability Clipping, Max-margin Fusion), but it does not present a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Project Page: https://github.com/Infi XAI/Infi FPO |
| Open Datasets | Yes | We constructed a new training dataset comprising 150k examples across mathematics, coding, and general tasks. Data sources include Infinity-Instruct [14], Numina Math-1.5 [15], and Kod Code-V1-SFT [16], with detailed statistics provided in Table 1. |
| Dataset Splits | Yes | In the first stage, we performed SFT on half of our dataset with yw for 3 epochs, using a learning rate of 1e-6 to build the SFT model. This model then served as the foundation for the second stage, where we conducted Preference Optimization on the remaining half of the data for a single epoch, with a learning rate of 1e-7 and β = 2.5. |
| Hardware Specification | Yes | Our training process involved two stages with a batch size of 128 and a maximum sequence length of 4,096 tokens, using 16 NVIDIA A800-80GB GPUs. |
| Software Dependencies | No | The paper mentions "vLLM for acceleration" in Appendix E.2.1 and implicitly uses frameworks compatible with "NVIDIA A800-80GB GPUs" (e.g., CUDA, PyTorch), but it does not provide specific version numbers for any software components. |
| Experiment Setup | Yes | Our training process involved two stages with a batch size of 128 and a maximum sequence length of 4,096 tokens, using 16 NVIDIA A800-80GB GPUs. We implemented a cosine learning rate schedule with a 10% warmup ratio. In the first stage, we performed SFT on half of our dataset with yw for 3 epochs, using a learning rate of 1e-6 to build the SFT model. This model then served as the foundation for the second stage, where we conducted Preference Optimization on the remaining half of the data for a single epoch, with a learning rate of 1e-7 and β = 2.5. |