Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Nonlinear Feature Extraction with Max-Margin Data Shifting

Authors: Jianqiao Wangni, Ning Chen

AAAI 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The empirical results on multiple linear and nonlinear models demonstrate that MMDS can efﬁciently improve the performance of unsupervised extractors.
Researcher Affiliation	Academia	MOE Key lab of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNList, Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China
Pseudocode	Yes	The procedure of training an extractor on the MMDS data is summarized as Training Data{x} Shift T [x] Train Extractor.
Open Source Code	No	The paper does not provide any concrete access information for source code.
Open Datasets	Yes	The Yale dataset contains 165 images of 15 individuals. The Yale B (the extended Yale Face Database B) dataset includes 38 individuals and about 64 near frontal face images. The ORL dataset contains 10 different varying lighting and facial detail images for each of 40 distinct subjects. The 11 Tumor dataset contains 174 gene samples of 11 different class... The TRECVID2003 dataset contains 1078 video shots of 5 categories... The Digits dataset is within Open CV. We extract 64 dimensional HOG features (Dalal and Triggs 2005)... The Letters dataset (Ben, Carlos, and Daphne 2004)...
Dataset Splits	Yes	We use 5 (or the number of minimal category) folds cross-validation to ﬁnd proper parameters. We randomly choose 500 samples from the MNIST dataset, and use 10,000 samples for testing. The Digits dataset... evenly split them to train/test sets. The Letters dataset... use 5,375 samples for training while the other 46,777 for testing.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper mentions software like the 'Lib Linear package', 'Open CV', and 'L-BFGS solver' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	For all models, the data are projected into 10-dimensional space (K=10). We use 5 (or the number of minimal category) folds cross-validation to ﬁnd proper parameters. The dimensions of RFF is set to 500 for Digits and 1,000 for the other datasets. We use an L-BFGS solver with a ﬁxed maximum number of iterations, and normalize the data to (0.1, 0.9) before sending to the autoencoder. Figure 3 presents the performance of various extractors under different shifting scales (i.e., σ2 in MMDS).