Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reconsidering Generative Objectives For Counterfactual Reasoning
Authors: Danni Lu, Chenyang Tao, Junya Chen, Fan Li, Feng Guo, Lawrence Carin
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We consider a wide range of semi-synthetic and real-world tasks to validate our models experimentally. Details of the experimental setup are described the SM, and our code is available from https: //github.com/Dannie Lu/BV-NICE. Importantly, we want to experimentally unveil aspects that are important for the design of generative causal models. More analyses can be found in the SM. |
| Researcher Affiliation | Academia | Danni Lu1, , Chenyang Tao2, , Junya Chen2,3, Fan Li4, Feng Guo1,5, Lawrence Carin2 1 Department of Statistics, Virginia Tech, Blacksburg, VA, USA 2 Electrical & Computer Engineering, Duke University, Durham, NC, USA 3 School of Mathematical Sciences, Fudan University, Shanghai, China 4 Department of Statistical Science, Duke University, Durham, NC, USA 5 Virginia Tech Transportation Institute, Blacksburg, VA, USA |
| Pseudocode | Yes | Algorithm 1 BV-NICE |
| Open Source Code | Yes | our code is available from https: //github.com/Dannie Lu/BV-NICE. |
| Open Datasets | Yes | Datasets To extensively validate the proposed procedure in a realistic setup, we consider the following four datasets: (i) IHDP1000 [31]: a semi-synthetic dataset with 1, 000 simulations of different treatment and outcomes mechanism. (ii) ACIC2016 [20]: a benchmark dataset released by Atlantic Causal Inference Competition, which involves 77 semi-synthetic datasets with 100 replications each. (iii) JOBS [44]: a real-world dataset with binary outcomes, a small portion of the data comes from randomized trials. (iv) SHRP2 [27]: a 3-year case-cohort study of driver behavior and environmental factors at the onset of crashes and under normal driving conditions, derived from over 1 million hours of continuous video recordings. Detailed descriptions of these datasets can be found in the SM. |
| Dataset Splits | Yes | For practical cross-validation, we use 7/3 split for training and validation respectively, and rely on validation outcome RMSE to set best configuration. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | Model architecture, hyper-parameter tuning and data pre-processing For all instantiations, we use fully-connected multi-layer perceptrons (MLP) as our flexible learner. We randomly sample model architectures (number of layers, hidden units) and other hyper-parameters (learning rate, batch-size, regularization strength, etc.). For practical cross-validation, we use 7/3 split for training and validation respectively, and rely on validation outcome RMSE to set best configuration. |