Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Feature Selection in the Contrastive Analysis Setting
Authors: Ethan Weinberger, Ian Covert, Su-In Lee
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We motivate our approach with a novel information-theoretic analysis of representation learning in the CA setting, and we empirically validate CFS on a semi-synthetic dataset and four real-world biomedical datasets. |
| Researcher Affiliation | Academia | Ethan Weinberger Paul G. Allen School of Computer Science University of Washington Seattle, WA 98195 EMAIL Ian C. Covert Department of Computer Science Stanford University Stanford, CA 94305 EMAIL Su-In Lee Paul G. Allen School of Computer Science University of Washington Seattle, WA 98195 EMAIL |
| Pseudocode | No | The paper describes the proposed method (CFS) using text descriptions and mathematical equations, accompanied by a diagram (Figure 2), but it does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | An open-source implementation of our method is available at https://github.com/suinleelab/CFS. |
| Open Datasets | Yes | We validate our approach empirically through extensive experiments on a semi-synthetic dataset introduced in prior work as well as four real-world biomedical datasets... Raw data was downloaded from https://archive.ics.uci.edu/ml/machine-learning-databases/00342/. |
| Dataset Splits | Yes | For all experiments we divided our data using an 80-20 train-test split, and we report the mean standard error over five random seeds for each method. |
| Hardware Specification | Yes | All experiments were peformed on a system running Cent OS 7.9.2009 equipped with an NVIDIA RTX 2080 TI GPU with CUDA 11.7. |
| Software Dependencies | Yes | CFS models were implemented using Py Torch [50] (version 1.13) with the Py Torch Lightning API4...equipped with an NVIDIA RTX 2080 TI GPU with CUDA 11.7. |
| Experiment Setup | Yes | For all CFS variants we let our reconstruction function f be a multilayer perceptron with two hidden layers of size 512 with Re LU activation functions...All CFS models were trained using the Py Torch implementation of the Adam [51] optimizer with default hyperparameters. Batch sizes of 128 were used for all experiments. |