Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Counterfactual Outcomes Under Rank Preservation

Authors: Peng Wu, Haoxuan Li, Chunyuan Zheng, Yan Zeng, Jiawei Chen, Yang Liu, Ruocheng Guo, Kun Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive semi-synthetic and real-world experiments are conducted to demonstrate the effectiveness of the proposed method.
Researcher Affiliation	Collaboration	1Beijing Technology and Business University 2Peking University 3Mohamed bin Zayed University of Artificial Intelligence 4Zhejiang University 5University of California, Santa Cruz 6Intuit AI Research 7Carnegie Mellon University
Pseudocode	No	The paper describes methods and proposes an estimator but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We share the data and code in the supplementary material.
Open Datasets	Yes	Following previous studies [47, 48, 53, 54], we conduct experiments on semi-synthetic dataset IHDP and real-world dataset JOBS. The IHDP dataset [55] is constructed from the Infant Health and Development Program (IHDP) with 747 individuals and 25 covariates. The JOBS dataset [56] is based on the National Supported Work program with 3,212 individuals and 17 covariates.
Dataset Splits	Yes	We generate 10,000 samples with 63/27/10 train/validation/test split and vary m {5, 10, 20, 40} in our synthetic experiment. ... We follow [47] to split the data into training/validation/testing set with ratios 63/27/10 and 56/24/20 with 100 and 10 repeated times on the IHDP and the JOBS datasets, respectively.
Hardware Specification	No	In addition, we run all experiments on the Google Colab platform.
Software Dependencies	No	For the representation model, we use the MLP for the base model and tune the layers in {1, 2, 3}. In addition, we adopt the logistic regression model as the propensity model. ... For the kernel choice, we select the kernel function between the Gaussian kernel function and the Epanechnikov kernel function... The paper mentions types of models and functions but does not specify software or library versions (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	For the representation model, we use the MLP for the base model and tune the layers in {1, 2, 3}. In addition, we adopt the logistic regression model as the propensity model. We tune the learning rate in {0.001, 0.005, 0.01, 0.05, 0.1}. For the kernel choice, we select the kernel function between the Gaussian kernel function and the Epanechnikov kernel function, and tune the bandwidth in {1, 3, 5, 7, 9}.