Differentiable Dynamic Normalization for Learning Deep Representation
Authors: Ping Luo, Peng Zhanglin, Shao Wenqi, Zhang Ruimao, Ren Jiamin, Wu Lingyun
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive studies show that DN outperforms its counterparts in CIFAR10 and Image Net. and Sec.4.1 evaluates DN in CIFAR10 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015), where it is demonstrated by comparing with previous normalization techniques. Ablation studies are presented in Sec.4.3. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, The University of Hong Kong 2Department of Electronic Engineering, The Chinese University of Hong Kong 3Sense Time Group Ltd. |
| Pseudocode | Yes | Algorithm 1 Computations of DN |
| Open Source Code | No | The paper does not provide any specific repository link or explicit statement about the availability of the source code for the methodology described. |
| Open Datasets | Yes | Extensive studies in CIFAR10 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015) demonstrate that DN is able to outperform its counterparts. |
| Dataset Splits | No | The paper mentions evaluating on CIFAR10 and ImageNet, and reports results on the 'validation set' for ImageNet, but does not explicitly provide specific percentages, sample counts, or clear statements about the training, validation, and test splits used for its experiments needed for reproduction. |
| Hardware Specification | No | The paper mentions that gradients are 'aggregated across GPUs' but does not specify any particular GPU model, CPU, or other hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Tensor Flow and Py Torch' as platforms for implementation but does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | All models are trained on CIFAR10 with different batch sizes, where the gradients are aggregated across GPUs, while the statistics are estimated within each GPU. We repeat to train all models five times... and The batch size is (8, 32). |