reproducibilityindex.ai

Estimating Jaccard Index with Missing Observations: A Matrix Calibration Approach

Authors: Wenye Li

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We carried out a series of empirical experiments and the results conﬁrmed our theoretical justiﬁcation. The evaluation also reported signiﬁcantly improved results in real learning tasks on benchmark datasets.
Researcher Affiliation	Academia	Wenye Li Macao Polytechnic Institute Macao SAR, China wyli@ipm.edu.mo
Pseudocode	Yes	Algorithm 1 Projection onto R = S T
Open Source Code	No	The paper does not provide a direct link or explicit statement for the open-sourcing of the code for its proposed methodology.
Open Datasets	Yes	To evaluate the performance of the proposed method, four benchmark datasets were used in our experiments. MNIST: a grayscale image database of handwritten digits ( 0 to 9 )... USPS: another grayscale image database of handwritten digits... PROTEIN: a bioinformatics database... WEBSPAM: a dataset with both spam and non-spam web pages.
Dataset Splits	No	The paper states 'In each run, 90% of the samples were randomly chosen as the training set and the remaining 10% were used as the testing set.' It does not explicitly mention a separate validation set for hyperparameter tuning or early stopping.
Hardware Specification	No	The paper mentions 'on our platform' when discussing computational time but does not provide specific hardware details such as exact CPU/GPU models, memory, or cloud instance types.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4).
Experiment Setup	Yes	For each dataset, we experimented with 1, 000 and 10, 000 samples respectively. For each sample, different portions (from 10% to 90%) of feature values were marked as missing... For the k NN approach, we iterated different k from 1 to 5 and the best result was collected... In each run, 90% of the samples were randomly chosen as the training set and the remaining 10% were used as the testing set. The mean and standard deviation of the classiﬁcation errors in 1, 000 runs were reported.