Cuckoo Feature Hashing: Dynamic Weight Sharing for Sparse Analytics

Authors: Jinyang Gao, Beng Chin Ooi, Yanyan Shen, Wang-Chien Lee

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on prediction tasks with hundred-millions of features demonstrate that CCFH can achieve the same level of performance by using only 15%-25% parameters compared with conventional feature hashing. Experimental results on public benchmark CTR datasets Avazu and malicious URL detection dataset show that compared with feature hashing and multiple hashing, CCFH can further reduce the number of parameters by around 4x to 8x to achieve the same model performance.
Researcher Affiliation Academia Jinyang Gao1, Beng Chin Ooi1, Yanyan Shen2, Wang-Chien Lee3 1 National University of Singapore 2 Shanghai Jiao Tong University 3 The Pennsylvania State University
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes URL [Ma et al., 2009] is a dataset for malicious URL detection. Avazu [Juan et al., 2016] is a dataset for mobile Ads CTR prediction from Kaggle competition.
Dataset Splits No The paper mentions 'test error rate' and 'log loss' for evaluation, implying a test set, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification No Table 3 shows a typical specification for an Intel Xeon CPU. However, this table presents 'A Typical CPU Specification' for illustrative purposes and does not state that these specific hardware components were used for the paper's experiments. No actual hardware specifications used for their experiments are provided.
Software Dependencies No The paper mentions using 'logistic regression' as the model and 'Adam' for learning rate adjustment with a citation, but it does not provide specific version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow, scikit-learn versions).
Experiment Setup Yes All models are trained using mini-batch stochastic gradient descent (SGD). The batch-size is set to 256, and the learning rate is adjusted based on Adam [Kingma and Ba, 2014] with a momentum of 0.9. L1-penalty is applied to the model parameter as used in [Weinberger et al., 2009; Zhou et al., 2015] to introduce model sparsity (i.e. feature selection). For CCFH, we split the parameter space as two part: 80% for feature weight v and 20% for weight indicator q.