Principal Differences Analysis: Interpretable Characterization of Differences between Distributions

Authors: Jonas W. Mueller, Tommi Jaakkola

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 ExperimentsFigure 1a illustrates the cost function of PDA pertaining to two 3-dimensional distributions (see details in Supplementary Information S1). The synthetic MADELON dataset used in the NIPS 2003 feature selection challenge consists of points (n m 1000, d 500)...Figure 1b demonstrates how well SPARDA (red), the top sparse principal component (black) [27], sparse LDA (green) [2], and the logistic lasso (blue) [12] are able to identify the 20 relevant features over different settings of their respective regularization parameters...
Researcher Affiliation Academia Jonas Mueller CSAIL, MIT jonasmueller@csail.mit.edu Tommi Jaakkola CSAIL, MIT tommi@csail.mit.edu
Pseudocode Yes RELAX Algorithm: Solves the dualized semidefinite relaxation of SPARDA (7). Returns the largest eigenvector of the solution to (6) as the desired projection direction for SPARDA. and Projection Algorithm: Projects matrix onto positive semidefinite cone of unit-trace matrices Br (the feasible set in our relaxation).
Open Source Code No The paper does not provide concrete access to source code for the described methodology.
Open Datasets Yes The synthetic MADELON dataset used in the NIPS 2003 feature selection challenge consists of points (n m 1000, d 500)... and We apply SPARDA to expression measurements of 10,305 genes profiled in 1,691 single cells from the somatosensory cortex and 1,314 hippocampus cells sampled from the brains of juvenile mice [29].
Dataset Splits No The paper mentions 'cross-validation' for parameter selection but does not provide specific details on train/validation/test splits (e.g., percentages, sample counts, or predefined split citations).
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup No The paper mentions regularization parameters but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or training configurations.