Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation
Authors: Yibo Yang, Xiaojie Li, Motasem Alfarra, Hasan Abed Al Kader Hammoud, Adel Bibi, Philip Torr, Bernard Ghanem
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we achieve significant performance improvements compared to previous methods. Particularly, our method for CNN and Transformer architectures on Image Net is able to attain a competitive performance with global BP, saving more than 40% memory consumption. |
| Researcher Affiliation | Academia | 1King Abdullah University of Science and Technology 2Harbin Institute of Technology (Shenzhen) 3Peng Cheng Laboratory 4University of Oxford. |
| Pseudocode | Yes | Algorithm 1 A Py Torch-like pseudocode for local training with SGR |
| Open Source Code | No | The paper provides pseudocode (Algorithm 1) and implementation details in Appendix E, but it does not include an explicit statement about releasing the source code or a link to a code repository. |
| Open Datasets | Yes | We conduct extensive experiments on CIFAR-10, CIFAR-100, and Image Net to verify the effectiveness of our method |
| Dataset Splits | No | The paper describes training details, including epochs and batch sizes, and mentions using a 'test set' for evaluation, but it does not explicitly specify the training/validation/test split percentages or sample counts for the datasets used in the main experiments. It mentions 'ImageNet validation set' in the context of linear classification for self-supervised learning evaluation, but not for the primary method training splits. |
| Hardware Specification | Yes | Experiments are conducted on 8 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'PyTorch', 'SGD optimizer', 'AdamW optimizer', 'LARS', and data augmentation techniques like 'Rand Augment' and 'Random Erasing', but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | To train Res Net models, we use the SGD optimizer with a learning rate of 0.1, a momentum of 0.9, and weight decay of 0.0001. We train these networks for 100 epochs with a batch size of 1024. The initial learning rate is set to 0.1 and decreases by a factor of 0.1 at epochs 30, 60, and 90. ... For ViT-S/16 training on Image Net, we use a batch size of 4096. We use the Adam W optimizer with a learning rate of 0.0016, and we apply a cosine learning rate annealing schedule after a linear warm-up for the first 20 epochs. The training process lasts for 300 epochs... |