Compressing Neural Networks with the Hashing Trick
Authors: Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, Yixin Chen
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experimental Results We conduct extensive experiments to evaluate Hashed Nets on eight benchmark datasets. |
| Researcher Affiliation | Collaboration | Wenlin Chen WENLINCHEN@WUSTL.EDU James T. Wilson J.WILSON@WUSTL.EDU Stephen Tyree STYREE@NVIDIA.COM Kilian Q. Weinberger KILIAN@WUSTL.EDU Yixin Chen CHEN@CSE.WUSTL.EDU Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO, USA NVIDIA, Santa Clara, CA, USA |
| Pseudocode | No | The paper describes the computational steps with mathematical equations but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper mentions using a third-party open-source implementation "xx Hash" but does not provide a statement or link for the source code of their own methodology. |
| Open Datasets | Yes | Datasets. Datasets consist of the original MNIST handwritten digit dataset, along with four challenging variants (Larochelle et al., 2007). Each variation amends the original through digit rotation (ROT), background superimposition (BG-RAND and BG-IMG), or a combination thereof (BG-IMG-ROT). In addition, we include two binary image classification datasets: CONVEX and RECT (Larochelle et al., 2007). |
| Dataset Splits | Yes | Hyperparameters are selected for all algorithms with Bayesian optimization (Snoek et al., 2012) and hand tuning on 20% validation splits of the training sets. |
| Hardware Specification | Yes | Hashed Nets and all accompanying baselines were implemented using Torch7 (Collobert et al., 2011) and run on NVIDIA GTX TITAN graphics cards with 2688 cores and 6GB of global memory. |
| Software Dependencies | No | The paper mentions "Torch7" and "Bayesian Optimization MATLAB implementation bayesopt.m" but does not specify version numbers for these software dependencies. |
| Experiment Setup | Yes | Models are trained via stochastic gradient descent (minibatch size of 50) with dropout and momentum. Re LU is adopted as the activation function for all models. Hyperparameters are selected for all algorithms with Bayesian optimization (Snoek et al., 2012) and hand tuning on 20% validation splits of the training sets. |