Short and Deep: Sketching and Neural Networks

Authors: Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate our sketches on real and synthetic datasets. Our approach leads to more compact neural networks than existing methods such as feature hashing and Gaussian random projections, at competitive or better performance.6 EXPERIMENTS WITH SYNTHETIC DATA7 EXPERIMENTS WITH LANGUAGE PROCESSING TASKS
Researcher Affiliation Industry Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar Google Brain
Pseudocode No No pseudocode or algorithm blocks are explicitly labeled or structured in the paper.
Open Source Code No No statement or link providing concrete access to open-source code for the described methodology was found.
Open Datasets Yes Entity Type Tagging. ... See Gillick et al. (2014) for more details on features and labels for this task.Reuters-news Topic Classification. The Reuters RCV1 data set consists of a collection of approximately 800,000 text articles...AG-news Topic Classification. We perform topic classification on 680K articles from AG news corpus...
Dataset Splits Yes In all experiments, we train on 90% of the examples and evaluate mean squared error on the rest.As before, we trained on 90% of the examples and evaluated on the remaining 10%.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running the experiments are provided.
Software Dependencies No No specific software dependencies with version numbers (e.g., library or solver names with version numbers) are mentioned in the paper.
Experiment Setup Yes We experimented with input sizes of 1000, 2000, 5000, and 10,000 and reduced the dimensionality of the original features using sketches with t {1, 2, 4, 6, 8, 10, 12, 14} blocks. In addition, we experimented with networks trained on the original features. We encouraged parameter sparsity in the first layer using ℓ1-norm regularization and learn parameters using the proximal stochastic gradient method.In all experiments, we use two-layer feed-forward networks with ReLU activations and 100 hidden units in each layer. We use a softmax output for multiclass classification and multiple binary logistic outputs for multilabel tasks.