Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation

Authors: Yibo Yang, Xiaojie Li, Motasem Alfarra, Hasan Abed Al Kader Hammoud, Adel Bibi, Philip Torr, Bernard Ghanem

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we achieve significant performance improvements compared to previous methods. Particularly, our method for CNN and Transformer architectures on Image Net is able to attain a competitive performance with global BP, saving more than 40% memory consumption.
Researcher Affiliation Academia 1King Abdullah University of Science and Technology 2Harbin Institute of Technology (Shenzhen) 3Peng Cheng Laboratory 4University of Oxford.
Pseudocode Yes Algorithm 1 A Py Torch-like pseudocode for local training with SGR
Open Source Code No The paper provides pseudocode (Algorithm 1) and implementation details in Appendix E, but it does not include an explicit statement about releasing the source code or a link to a code repository.
Open Datasets Yes We conduct extensive experiments on CIFAR-10, CIFAR-100, and Image Net to verify the effectiveness of our method
Dataset Splits No The paper describes training details, including epochs and batch sizes, and mentions using a 'test set' for evaluation, but it does not explicitly specify the training/validation/test split percentages or sample counts for the datasets used in the main experiments. It mentions 'ImageNet validation set' in the context of linear classification for self-supervised learning evaluation, but not for the primary method training splits.
Hardware Specification Yes Experiments are conducted on 8 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions software components like 'PyTorch', 'SGD optimizer', 'AdamW optimizer', 'LARS', and data augmentation techniques like 'Rand Augment' and 'Random Erasing', but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes To train Res Net models, we use the SGD optimizer with a learning rate of 0.1, a momentum of 0.9, and weight decay of 0.0001. We train these networks for 100 epochs with a batch size of 1024. The initial learning rate is set to 0.1 and decreases by a factor of 0.1 at epochs 30, 60, and 90. ... For ViT-S/16 training on Image Net, we use a batch size of 4096. We use the Adam W optimizer with a learning rate of 0.0016, and we apply a cosine learning rate annealing schedule after a linear warm-up for the first 20 epochs. The training process lasts for 300 epochs...