reproducibilityindex.ai

MISSION: Ultra Large-Scale Feature Selection using Count-Sketches

Authors: Amirali Aghazadeh, Ryan Spring, Daniel Lejeune, Gautam Dasarathy, Anshumali Shrivastava, baraniuk

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We designed a set of simulations to evaluate MISSION in a controlled setting. All experiments were performed on a single machine, 2x Intel Xeon E5-2660 v4 processors (28 cores / 56 threads) with 512 GB of memory. The code1 for training and running our randomized-hashing approach is available online.
Researcher Affiliation	Academia	1Department of Electrical Engineering, Stanford University, Stanford, California 2Department of Computer Science, Rice University, Houston, Texas 3Department of Electrical and Computer Engineering, Rice University, Houston, Texas.
Pseudocode	Yes	Algorithm 1 MISSION
Open Source Code	Yes	The code1 for training and running our randomized-hashing approach is available online. 1https://github.com/rdspring1/MISSION
Open Datasets	Yes	Datasets: We used four datasets in the experiments: 1) KDD2012, 2) RCV1, 3) Webspam Trigram, 4) DNA2. The statistics of these datasets are summarized in Table 2. 2http://projects.cbio.mines-paristech.fr/largescalemetagenomics/ 3https://www.kaggle.com/c/criteo-display-ad-challenge
Dataset Splits	No	The paper provides 'Train Size' and 'Test Size' for the datasets but does not explicitly mention a 'validation' split or describe its configuration.
Hardware Specification	Yes	All experiments were performed on a single machine, 2x Intel Xeon E5-2660 v4 processors (28 cores / 56 threads) with 512 GB of memory.
Software Dependencies	No	The paper states 'The code1 for training and running our randomized-hashing approach is available online.' but does not specify particular software dependencies with version numbers (e.g., Python, PyTorch, etc.).
Experiment Setup	Yes	For all methods, we used the logistic loss for binary classiﬁcation and the cross-entropy loss for multi-class classiﬁcation. For all the experiments, the Count-Sketch data structure used 3 hash functions, and the model weights were divided equally among the hash arrays. All the methods were trained for a single epoch with a learning rate of 0.5.