Learning and Memorization
Authors: Satrajit Chatterjee
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment 1. In the first experiment, we apply the above procedure to the Binary-MNIST task (as defined in Section 3) to see if this approach to memorization can generalize. For this experiment, we construct a network with 5 hidden layers of 1024 luts and 1 lut in the output layer. We set k = 8, i.e., each lut in the network takes 8 inputs. The network achieves a training accuracy of 0.89 on this task, which is perhaps not so surprising since we are memorizing the training data after all. But what is surprising is that the network achieves an accuracy of 0.87 on a heldout set (the 10,000 test images in MNIST) which indicates generalization. |
| Researcher Affiliation | Industry | Satrajit Chatterjee 1 Two Sigma, New York, NY, USA. Correspondence to: Satrajit Chatterjee <satrajit.chatterjee@twosigma.com>. |
| Pseudocode | No | The paper describes the learning procedure in textual paragraphs (e.g., in Section 2 and 3) but does not include any formally structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository or mention code in supplementary materials. |
| Open Datasets | Yes | Now consider a binary classification task on MNIST (Le Cun & Cortes, 2010) of separating the digits 0 through 4 (we map these to the 0 class) from the digits 5 through 9 (the 1 class) where the pixels are 1-bit quantized. Thus the task is to learn a function f : B28 28 B. We call this the Binary-MNIST task (overloading binary here to mean both binary classification and binary inputs). ... Experiment 7. Next we look at memorization on CIFAR-10 which is a collection of 32 pixel by 32 pixel color images belonging to 10 classes. |
| Dataset Splits | No | The paper mentions training data and a 'held-out set (the 10,000 test images in MNIST)' which functions as a test set. It does not explicitly define a separate validation set or its split, nor does it specify exact training/validation/test percentages or sample counts beyond the general description of MNIST test images. |
| Hardware Specification | No | The paper mentions that 'it typically takes less than 30 seconds using a single threaded unoptimized implementation (Python with Num Py) to run an experiment,' implying the use of a CPU, but it does not specify any exact CPU model, GPU, or other hardware details. |
| Software Dependencies | No | The paper mentions 'Python with Num Py' as the implementation environment but does not provide specific version numbers for either Python or NumPy. |
| Experiment Setup | Yes | For this experiment, we construct a network with 5 hidden layers of 1024 luts and 1 lut in the output layer. We set k = 8, i.e., each lut in the network takes 8 inputs. |