Efficient Equivariant Network
Authors: Lingshen He, Yuxuan Chen, zhengyang shen, Yiming Dong, Yisen Wang, Zhouchen Lin
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments verify that our model can significantly improve previous works with smaller model size. Especially, under the setting of training on 1/5 data of CIFAR10, our model improves G-CNNs by 5%+ accuracy, while using only 56% parameters and 68% FLOPs. (Abstract) |
| Researcher Affiliation | Collaboration | 1Key Laboratory of Machine Perception (MOE), School of Artificial Intelligence, Peking University 2Institute for Artificial Intelligence, Peking University 3School of Mathematical Sciences and LMAM, Peking University 4Pazhou Lab, Guangzhou 510330, China |
| Pseudocode | No | The paper describes the layer implementation in text and includes Figure 1, which is a diagram illustrating the E4-layer, but it does not contain structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository. The 'Limitation and Future Work' section mentions plans for future implementation: 'In the future, we will try to implement a customized CUDA kernel for GPU acceleration to reduce training and inference time of our model,' implying the code is not currently publicly available. |
| Open Datasets | Yes | The MNIST-rot dataset [33] is the most widely used benchmark to test the equivariant models. It contains 62k 28 28 randomly rotated gray-scale handwritten digits. Images in the dataset are split into 10k for training, 2k for validation and 50k for testing. (Section 5.1) The CIFAR-10 and the CIFAR100 datasets consist of 32 32 images... Both of the datasets contain 50k training data and 10k testing data. (Section 5.2) |
| Dataset Splits | Yes | Images in the dataset are split into 10k for training, 2k for validation and 50k for testing. (Section 5.1) |
| Hardware Specification | Yes | All the experiments are done on the Ge Force RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions optimizers like 'Adam optimizer' and 'stochastic gradient descent' but does not specify versions for any software libraries, frameworks, or programming languages used (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | Our model is trained using the Adam optimizer [27] for 200 epochs with a batch size of 128. The learning rate is initialized as 0.02 and is reduced by 10 at the 60th, 120th and 160th epochs. The weight decay is set as 0.0001 and no data augmentation is used during training. (Section 5.1) We use the stochastic gradient descent with an initial learning rate of 0.1, a Nesterov momentum of 0.9 and a weight decay of 0.0005. The learning rate is reduced by 5 at 60th, 120th, and 160th epochs. Models are trained for 200 epochs using 128 batch size. (Section 5.2) |