Compressing Neural Networks with the Hashing Trick

Authors: Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, Yixin Chen

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. Experimental Results We conduct extensive experiments to evaluate Hashed Nets on eight benchmark datasets.
Researcher Affiliation Collaboration Wenlin Chen WENLINCHEN@WUSTL.EDU James T. Wilson J.WILSON@WUSTL.EDU Stephen Tyree STYREE@NVIDIA.COM Kilian Q. Weinberger KILIAN@WUSTL.EDU Yixin Chen CHEN@CSE.WUSTL.EDU Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO, USA NVIDIA, Santa Clara, CA, USA
Pseudocode No The paper describes the computational steps with mathematical equations but does not provide structured pseudocode or an algorithm block.
Open Source Code No The paper mentions using a third-party open-source implementation "xx Hash" but does not provide a statement or link for the source code of their own methodology.
Open Datasets Yes Datasets. Datasets consist of the original MNIST handwritten digit dataset, along with four challenging variants (Larochelle et al., 2007). Each variation amends the original through digit rotation (ROT), background superimposition (BG-RAND and BG-IMG), or a combination thereof (BG-IMG-ROT). In addition, we include two binary image classification datasets: CONVEX and RECT (Larochelle et al., 2007).
Dataset Splits Yes Hyperparameters are selected for all algorithms with Bayesian optimization (Snoek et al., 2012) and hand tuning on 20% validation splits of the training sets.
Hardware Specification Yes Hashed Nets and all accompanying baselines were implemented using Torch7 (Collobert et al., 2011) and run on NVIDIA GTX TITAN graphics cards with 2688 cores and 6GB of global memory.
Software Dependencies No The paper mentions "Torch7" and "Bayesian Optimization MATLAB implementation bayesopt.m" but does not specify version numbers for these software dependencies.
Experiment Setup Yes Models are trained via stochastic gradient descent (minibatch size of 50) with dropout and momentum. Re LU is adopted as the activation function for all models. Hyperparameters are selected for all algorithms with Bayesian optimization (Snoek et al., 2012) and hand tuning on 20% validation splits of the training sets.