Dynamic Neural Response Tuning
Authors: Tian Qiu, Wenxiang Xu, Lin Chen, Linyun Zhou, Zunlei Feng, Mingli Song
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental studies indicate that the proposed DNRT is highly interpretable, applicable to various mainstream network architectures, and can achieve remarkable performance compared with existing neural response mechanisms in multiple tasks and domains. 5 EXPERIMENTAL STUDY |
| Researcher Affiliation | Academia | Tian Qiu, Wenxiang Xu, Lin Chen, Linyun Zhou, Zunlei Feng , Mingli Song Zhejiang University {tqiu,xuwx1996,lin_chen,zhoulyaxx,zunleifeng,songml}@zju.edu.cn |
| Pseudocode | No | The paper describes the mechanisms mathematically and in prose but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/horrible-dong/DNRT. |
| Open Datasets | Yes | In the main experiments, we adopt five datasets, including MNIST (Le Cun et al., 1998), CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Image Net-100 (Deng et al., 2009), and Image Net-1K (Deng et al., 2009), to verify the effectiveness of the proposed DNRT. |
| Dataset Splits | Yes | A Long-Tailed CIFAR-10 dataset is generated with reduced training examples per class while the validation set remains unchanged. Datasets. In the main experiments, we adopt five datasets, including MNIST (Le Cun et al., 1998), CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Image Net-100 (Deng et al., 2009), and Image Net-1K (Deng et al., 2009) |
| Hardware Specification | Yes | The experiment is conducted on an NVIDIA A100 (80G). The 'Latency' is obtained, on average, from the model inferring 224 224-pixel images on an NVIDIA 3090. |
| Software Dependencies | No | All experiments use the same data augmentations provided by timm (Wightman, 2019), Adam W optimizer with weight decay of 0.05, drop-path rate of 0.1, gradient clipping norm of 1.0, and cosine annealing learning rate scheduler with linear warm-up. (Specific version numbers for software libraries or frameworks are not provided.) |
| Experiment Setup | Yes | In the proposed ARR, the momentum m for updating the moving mean is empirically set to 0.1, and the balanced parameter λ varies depending on networks and datasets (see Appendix A.2). All experiments use the same data augmentations provided by timm (Wightman, 2019), Adam W optimizer with weight decay of 0.05, drop-path rate of 0.1, gradient clipping norm of 1.0, and cosine annealing learning rate scheduler with linear warm-up. Except for simple MLPs, which are trained for only 50 epochs from scratch, other networks are trained for 300 epochs from scratch. |