Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator

Authors: Makoto Yamada, Denny Wu, Yao-Hung Hubert Tsai, Hirofumi Ohta, Ruslan Salakhutdinov, Ichiro Takeuchi, Kenji Fukumizu

ICLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through synthetic and real-world feature selection experiments, we show that the proposed framework can successfully detect statistically significant features. Last, we propose a sample selection framework for analyzing different members in the Generative Adversarial Networks (GANs) family.
Researcher Affiliation	Academia	Makoto Yamada1,2,3,4 , Denny Wu5,6 , Yao-Hung Hubert Tsai7, Hirofumi Ohta8, Ichiro Takeuchi9, Ruslan Salakhutdinov7, Kenji Fukumizu2,4 Kyoto University1, RIKEN AIP2, JST PRESTO3, Institute of Statistical Mathematics4, University of Toronto5, Vector Institute6, Carnegie Mellon University7, University of Tokyo8, Nagoya Institute of Technology9
Pseudocode	Yes	Algorithm 1 mmd Inf (Feature Selection)
Open Source Code	No	The paper does not explicitly state that its own source code for the proposed methodology is publicly available, nor does it provide a direct link to it. It only references a third-party GAN package (Chainer GAN package) used in their experiments.
Open Datasets	Yes	generated 5000 images (using Chainer GAN package 1 with CIFAR10 datasets)
Dataset Splits	No	The paper does not provide specific train/validation/test dataset splits, such as exact percentages or sample counts for each partition. It mentions using '1/2 of data to calculate the covariance matrix of MMD and the rest to perform feature selection and inference' but this is not a standard model validation split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Chainer GAN package' and 'pre-trained Resnet18' but does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	We fixed the number of selected features (prior to PSI) k to 30. ... features with p-value lower than the significance level α = 0.05 are selected as statistically significant features. For block MMD, in each experiment we set the candidate of block size as B = {10, 20, 50}. For incomplete MMD, in each experiment the ratio between number of pairs (i, j) sampled to compute incomplete MMD score and sample size is fixed at r = ℓ n {0.5, 5, 10}.