reproducibilityindex.ai

Categorical Feature Compression via Submodular Optimization

Authors: Mohammadhossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab Mirrokni, Afshin Rostamizadeh

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 3, we present our empirical evaluation of optimizing the mutual information objective as well as an end-to-end learning task. All the experiments are performed using the Criteo click prediction dataset (Criteo Labs, 2014)
Researcher Affiliation	Collaboration	1Google, New York, NY, USA 2Department of Electrical Engineering, Yale University, New Haven, CT, USA.
Pseudocode	Yes	Algorithm 1 Data structure to compute s F(S)
Open Source Code	No	The paper does not state that the authors' code for the described methodology is open source or provide a link to it.
Open Datasets	Yes	All the experiments are performed using the Criteo click prediction dataset (Criteo Labs, 2014)
Dataset Splits	Yes	All the experiments are performed using the Criteo click prediction dataset (Criteo Labs, 2014), which consists of 37 million instances for training and 4.4 million held-out points. Note, we use the labeled training ﬁle from this challenge and chronologically partitioned it into train/hold-out sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments, only mentioning general terms like 'large-scale machine learning tasks'.
Software Dependencies	No	The paper mentions 'TensorFlow' and 'Adam' as tools used but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	No	The paper mentions obtaining vocabularies of certain sizes (e.g., 10K to 160K) and that feature values appearing in at least 100 instances were used. However, it lacks specific hyperparameter values (e.g., learning rate, batch size) or detailed optimizer settings needed to reproduce the experimental setup.