Fully Nested Neural Network for Adaptive Compression and Quantization

Authors: Yufei Cui, Ziquan Liu, Wuguannan Yao, Qiao Li, Antoni B. Chan, Tei-wei Kuo, Chun Jason Xue

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results validate strong practical performance of the proposed approach. The experiments are conducted on the MNIST, Cifar10, Cifar100 and Image Net datasets. The FN3-neuron experiment is conducted on MNIST. On Cifar10/100, we run the ablation study of FN3-layer/-path/-channel/-bit, as well as the the comparisons of FN3-bit and Adabits. On Image Net, we run the experiments FN3-channels to compare with the US-Net.
Researcher Affiliation Academia Yufei Cui1 , Ziquan Liu1 , Wuguannan Yao2 , Qiao Li1 , Antoni B. Chan1 , Tei-wei Kuo1,3 , Chun Jason Xue1 1Department of Computer Science, City University of Hong Kong 2Department of Mathematics, City University of Hong Kong 3Department of Computer Science & Information Engineering, National Taiwan University
Pseudocode Yes Algorithm 1 Heuristic search algorithm Input: A trained FN3; candidate number C; block number per layer {Mi}i; ΔP; P = 100% ΔP; T = {tjk}K k ; tjk = ; pool = ; W = {(i , j )}. 1: while W = do 2: for k = 1 to K do 3: for c = 1 to C do 4: sample a new candidate (i , j ) from W. 5: pool = pool (tjk (i , j )). 6: end for 7: end for 8: Evaluate acc. of tj pool and select top-K {tj k}K k=1. 9: T = {tj k}K k=1. 10: W = W \ (W T). 11: while j W < (P +ΔP)Mi and P = 0 do 12: P = P ΔP; Move W by ΔP. 13: end while 14: pool = . 15: end while
Open Source Code No The paper mentions a link for supplementary proofs: 'The proofs can be accessed via http://visal.cs.cityu.edu.hk/static/pubs/conf/ijcai20-fn3-sup.pdf.' However, there is no explicit statement or link providing access to the source code for the methodology described in the paper.
Open Datasets Yes The experiments are conducted on the MNIST, Cifar10, Cifar100 and Image Net datasets.
Dataset Splits No The paper states that experiments are conducted on 'MNIST, Cifar10, Cifar100 and Image Net datasets'. While these are standard datasets with predefined splits, the paper does not explicitly provide the specific training, validation, and test split percentages or sample counts for reproduction, nor does it cite a source that defines these splits within the paper itself.
Hardware Specification Yes The model is optimized with Adam optimizer with initial learning rate 0.1 and a cosine annealing scheduler for 400 epochs, on 6 GTX 2080ti GPUs.
Software Dependencies No The paper mentions optimizers like SGD and Adam, but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or library versions) that would be needed to replicate the experiments.
Experiment Setup Yes A Detailed Setup of Experiments: FN3-layer: We use SGD with an initial learning rate 0.1, momentum factor 0.9 and batch size 128. The learning rate is scaled by 0.1 at epoch 150 and 225. The network is trained for 300 epochs. For heuristic search, as the ordered dropout (ODO) is performed on the bottleneck modules (group of layers), the height of sliding window is one. Because these blocks are well ordered, we set the window width to be 1 (ΔP = 1/9).