Train-by-Reconnect: Decoupling Locations of Weights from Their Values

Authors: Yushi Qiu, Reiji Suda

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate La Perm s versatility while producing extensive evidence to support our hypothesis: when the initial weights are random and dense, our method demonstrates speed and performance similar to or better than that of regular optimizers, e.g., Adam.
Researcher Affiliation Academia Yushi Qiu Reiji Suda Graduate School of Information Science and Technology, The University of Tokyo {yushi621, reiji}@is.s.u-tokyo.ac.jp
Pseudocode Yes Pseudocode for La Perm is shown in Algorithm 1.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the methodology described in this paper.
Open Datasets Yes In this section, we reconnect randomly weighted CNNs listed in Table 1 trained with the MNIST [26] and CIFAR-10 [23] datasets using La Perm under various settings.
Dataset Splits No The paper frequently mentions 'validation accuracy' and 'validation loss', and refers to '10,000 test images' when discussing validation loss in Figure 3. However, it does not explicitly provide the specific dataset split information (e.g., percentages or counts) for a distinct validation set, nor does it clarify if the '10,000 test images' are used as a validation set or the final test set during the training process.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions software like TensorFlow and Keras, but does not provide specific version numbers for these or any other ancillary software components needed to replicate the experiments.
Experiment Setup Yes We use five random seeds and train the network for 45 epochs with a batch size of 50 and a learning rate decay of 0.95. We choose k = 20 for La Perm. For all experiments in this paper, La Perm and LA use Adam as the inner optimizer.