Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data

Authors: Lingkai Kong, Haichuan Wang, Tonghan Wang, GUOJUN XIONG, Milind Tambe

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, COMPFLOW consistently outperforms strong baselines across a range of RL benchmarks with shifted-dynamics data. In this section, we first evaluate our approach across a range of environments in Gym-Mu Jo Co that exhibit different types of dynamics shifts. We then conduct ablation and hyperparameter studies to better understand the design choices and behavior of COMPFLOW. Finally, we assess the effectiveness of our method in a real-world inspired wildlife conservation task.
Researcher Affiliation Academia Lingkai Kong Haichuan Wang Tonghan Wang Guojun Xiong Milind Tambe School of Engineering and Applied Sciences Harvard University
Pseudocode Yes The pseudocode of COMPFLOW, instantiated with Soft Actor-Critic (SAC) [19], is presented in Appendix D. Algorithm 4 COMPFLOW built on Soft Actor-Critic (SAC)
Open Source Code Yes Our code is available at https://github.com/Haichuan23/Composite Flow
Open Datasets Yes We use the Mu Jo Co datasets from D4RL [16] as our offline data. We use the green security simulator in [67].
Dataset Splits Yes We compute the mean squared error (MSE) on a held-out 10% validation set and report the average MSE across different epochs during RL training.
Hardware Specification Yes The entire filtering process takes just 0.03 seconds on an A100 GPU.
Software Dependencies No The paper mentions 'Optimizer Adam [25]' and 'Python Optimal Transport (POT) library [15]'. While it mentions these tools and cites them, it does not provide specific version numbers for these or any other software components.
Experiment Setup Yes Additional hyperparameters are provided in Table 3 and Table 4. Table 3 lists Hyperparameters for RL training including Actor network architecture (256, 256), Batch size 128, Learning rate 3e-4, Discount factor (γ) 0.99, etc. Table 4 lists Hyperparameter setup for the offline and online flows including Number of hidden layers 6, Hidden dimension 256, Batch size 1024, ODE solver method Euler, etc.