Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Rotate the ReLU to Sparsify Deep Networks Implicitly
Authors: Nancy Nayak, Sheetal Kalyani
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the results with popular datasets such as MNIST, CIFAR-10, CIFAR-100, SVHN, and Imagenet with different architectures, including Vision Transformers and Efficient Net. We validate the results with extensive simulation 1 with various architectures such as fully connected neural network (FCNN), Res Net, Wide Res Net, and Transformer using a wide variety of datasets such as MNIST, CIFAR-10, CIFAR-100, SVHN, and large-scale image dataset like ILSVRC-2012 (Image Net-1k). |
| Researcher Affiliation | Academia | Nancy Nayak EMAIL Department of Electrical Engineering Indian Institute of Technology Madras, India. Sheetal Kalyani EMAIL Department of Electrical Engineering Indian Institute of Technology Madras, India. |
| Pseudocode | No | The paper describes the methodology using mathematical formulations and textual descriptions (e.g., equations 1-6 and descriptive paragraphs), but it does not include explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code: https://github.com/nancy-nayak/Rotated Re LU_TMLR |
| Open Datasets | Yes | We demonstrate the results with popular datasets such as MNIST, CIFAR-10, CIFAR-100, SVHN, and Imagenet with different architectures, including Vision Transformers and Efficient Net. We tested our method on various standard datasets such as MNIST, SVHN, CIFAR-10, and CIFAR-100. To evaluate our method, we compared its performance to the baseline performances of Res Net and Wide Res Nets3. Towards the end, to show the efficacy of the proposed method and to strengthen our claims, we have demonstrated the results on the Imagenet dataset with both Wide Res Net and Transformer architectures (Vaswani et al., 2017). The experiments with Transformer architecture with the Imagenet dataset are conducted using an NVIDIA-A100 GPU, while the rest of them are conducted on a single NVIDIA-Ge Force 2080 Ti GPU. |
| Dataset Splits | No | The paper mentions training durations (e.g., 'trained for 200 epochs', '1200 epochs', '400 epochs') and references 'validation accuracy' and 'test accuracy' for various datasets like MNIST, CIFAR-10, CIFAR-100, SVHN, and ImageNet. It also refers to 'standard Py Torch models' for the ImageNet dataset. However, it does not explicitly provide specific percentages, sample counts, or detailed methodologies for the training/test/validation dataset splits within the main text. |
| Hardware Specification | Yes | The experiments with Transformer architecture with the Imagenet dataset are conducted using an NVIDIA-A100 GPU, while the rest of them are conducted on a single NVIDIA-Ge Force 2080 Ti GPU. In Table 10 and Table 11, we reported the average inference time of VGG and Res Net-164-pre, respectively in a Quadro P2200 5GB RAM GPU while taking average over 200 forward pass. |
| Software Dependencies | No | The paper mentions using 'standard Py Torch models' for image classification with the Image Net dataset but does not specify version numbers for PyTorch, Python, or any other software libraries or solvers used, which are necessary for reproducible replication. |
| Experiment Setup | Yes | We found that the right initialization of the RRe LU slopes (bl) is crucial for effectively training the proposed method. ... Therefore, the slopes (bl for all l L) are initialized with a truncated Gaussian Mixture Model (GMM) with a mean of {+1, 1} and a variance of 3. ... All the architectures are trained using a Cosine Annealing (CA) learning rate (lr) scheduler. ... For a fair comparison, Re LU models are also trained for 1200 epochs, and the corresponding accuracy values are listed in the 2nd row of Table 1. ... The proposed Res Net-164-L1RRe LU achieves even higher accuracy with a great saving of 64.67% in memory and 52.96% in FLOP. ... For example for VGG, a batch size of 2048 is used whereas for Res Net-164-pre, a batch size of 3000 is used. ... FGSM samples considered in our experiments have a perturbation of ϵ = 0.031. The input variation parameters ϵv for the PGD attack are considered to be 0.031 and 0.1 for Res Net20 and Res Net56, respectively. The step size for each attack iteration is ϵv/20, and the number of attack iterations is 10. |