Intriguing properties of neural networks

Authors: Christian Szegedy; Wojciech Zaremba; Ilya Sutskever; Joan Bruna; Dumitru Erhan; Ian Goodfellow; Rob Fergus

ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a number of experiments on a few different networks and three datasets : For the MNIST dataset, we used the following architectures [11]... The Image Net dataset [3]... 10M image samples from Youtube (see [10])... Our minimimum distortion function D has the following intriguing properties which we will support by informal evidence and quantitative experiments in this section...
Researcher Affiliation Collaboration Christian Szegedy Google Inc. Wojciech Zaremba New York University Ilya Sutskever Google Inc. New York University Dumitru Erhan Google Inc. Ian Goodfellow University of Montreal New York University Facebook Inc.
Pseudocode No The paper describes the optimization procedure using mathematical notation and prose in Section 4.1, but it does not include a dedicated pseudocode or algorithm block.
Open Source Code No The paper includes a link to examples (http://goo.gl/huaGPb) but no explicit statement or link for open-source code for the methodology described.
Open Datasets Yes For the MNIST dataset, we used the following architectures [11]... The Image Net dataset [3]... 10M image samples from Youtube (see [10])
Dataset Splits Yes For the MNIST experiments, we use regularization with a weight decay of λ. Moreover, in some experiments we split the MNIST training dataset into two disjoint datasets P1, and P2, each with 30000 training cases... Next, we repeated our experiment on an Alex Net, where we used the validation set as I.
Hardware Specification No The paper does not specify the exact hardware (e.g., CPU/GPU models) used for running the experiments.
Software Dependencies No The paper mentions 'L-BFGS' as an optimizer but does not list any specific software dependencies with version numbers.
Experiment Setup Yes For the MNIST experiments, we use regularization with a weight decay of λ... Each of our models were trained with L-BFGS until convergence... All our examples use quadratic weight decay on the connection weights: lossdecay = λ P w2 i /k added to the total loss, where k is the number of units in the layer. Three of our models are simple linear (softmax) classifier without hidden units (FC10(λ)). One of them, FC10(1), is trained with extremely high λ = 1 in order to test whether it is still possible to generate adversarial examples in this extreme setting as well.Two other models are a simple sigmoidal neural network with two hidden layers and a classifier.