Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multi-objective Deep Data Generation with Correlated Property Control

Authors: Shiyu Wang, Xiaojie Guo, Xuanyang Lin, Bo Pan, Yuanqi Du, Yinkai Wang, Yanfang Ye, Ashley Petersen, Austin Leitgeb, Saleh Alkhalifa, Kevin Minbiole, William M. Wuest, Amarda Shehu, Liang Zhao

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments demonstrate our model s superior performance in generating data with desired properties. The code of Corr VAE is available at https://github.com/shi-yu-wang/Corr VAE. 5 Experiments
Researcher Affiliation	Collaboration	1Emory University, EMAIL 2IBM Thomas.J. Watson Research Center, EMAIL 3Cornell University, EMAIL 4Tufts University, EMAIL 5University of Notre Dame, EMAIL 6Villanova University, EMAIL 7Recursiv LLC, EMAIL 8George Mason University, EMAIL
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code of Corr VAE is available at https://github.com/shi-yu-wang/Corr VAE.
Open Datasets	Yes	1) The Quaternary Ammonium Compound (QAC) dataset is a real dataset that contains 462 quaternary ammonium compounds processed by the Minbiole Research Lab 1. An open-source cheminformatics and machine learning library were used to generate a number of properties or features for each of the compounds, in which molecular weight and the log P value were used as data properties in our experiments; 2) QM9 dataset is an enumeration of 134,000 stable organic molecules with up to 9 heavy atoms [39]; 3) d Sprites contains 737,280 total images regarding 2D shapes procedurally generated from 6 ground truth independent latent factors [33], in which shape, scale, x position and y position were employed in our experiments. To construct correlated properties, we additionally formed and tested a new property, x+y positions by summing up x position with y position; and 4) Pendulum dataset was originally synthesized to explore causality of the model [46].
Dataset Splits	No	The paper mentions 'training set' and 'test set' for various datasets but does not explicitly provide the specific percentages or sample counts for training, validation, and test splits needed for reproduction. It does not state explicit split ratios like '80/10/10'.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions 'An open-source cheminformatics and machine learning library' but does not provide specific software names with version numbers.
Experiment Setup	Yes	In Section 4.1.1, the paper mentions 'ρ1 and ρ2 are co-efficient hyper-parameters to penalize the two terms'. In Section 5.4.1, it states 'we train Corr VAE using shape, scale and three correlated properties x position, y position and x+y position while setting the dimension of w as 8'.