Polarity Is All You Need to Learn and Transfer Faster

Authors: Qingyang Wang, Michael Alan Powell, Eric W Bridgeford, Ali Geisa, Joshua T Vogelstein

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate with simulation and image classification tasks that if weight polarities are adequately set a priori, then networks learn with less time and data. We experimentally show that if the weight polarities are adequately set a priori, then networks can learn with less time and data (simulated task (Section 2) + two image classification tasks (Section 3)).
Researcher Affiliation Academia 1Department of Neuroscience, Johns Hopkins University, Baltimore MD US 2Center for Imaging Science, Johns Hopkins University, Baltimore MD US 3Department of Biomedical Engineering, Johns Hopkins University, Baltimore MD US 4Current location: United States Military Academy, Department of Mathematical Sciences, West Point NY US 5Department of Biostatistics, Johns Hopkins University, Baltimore MD US 6Institute for Computational Medicine, Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore MD US.
Pseudocode Yes Algorithm 1 Freeze-SGD
Open Source Code Yes Code for running all experiments and analysis may be found on Git Hub.
Open Datasets Yes We trained and tested networks on the Fashion-MNIST (grayscale) and CIFAR-10 (RGB-color) datasets, using Alex Net network architecture (Krizhevsky et al., 2017). For XOR-5D: "XOR-5D Data were prepared by sampling from the XOR-5D distribution as described in the main text". Additionally, "All data presented in the paper may be found here."
Dataset Splits Yes XOR-5D Data were prepared by sampling from the XOR5D distribution as described in the main text, the training data size varied, validation set is always 1000 samples across all scenarios. Computational efficiency was quantified by plotting the number of epochs to reach certain level of validation accuracy.
Hardware Specification Yes All experiments were run on 2 RTX-8000 GPUs.
Software Dependencies No The paper mentions using "Adam optimizer" and "Glorot Normal/Uniform" for initialization, and frameworks like AlexNet, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes For both datasets, we trained for 100 epochs, with lr=0.001 (best out of [0.1,0.03,0.01,0.001]). Specifically, we randomly initialized the networks following conventional procedures: we used Glorot Normal for conv layers and Glorot Uniform for fc layers.