reproducibilityindex.ai

FleXOR: Trainable Fractional Quantization

Authors: Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, Jeongin Yun

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform experiments using MNIST, CIFAR-10, and Image Net to show that inserting XOR gates learns quantization/encrypted bit decisions through training and obtains high accuracy even for fractional sub 1-bit weights.
Researcher Affiliation	Industry	Samsung Research, Seoul, Republic of Korea {dongsoo3.lee, sejung0.kwon, byeonguk.kim, dragwon.jeon, bpbs.park, ji6373.yun}@samsung.com
Pseudocode	Yes	Algorithm 1: Pseudo codes of Conv Layer with Fle XOR when the kernel size is k k, the number of input channel and output channel are Cin and Cout, respectively.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	We perform experiments using MNIST, CIFAR-10, and Image Net
Dataset Splits	No	The paper mentions training loss and test accuracy, and uses well-known datasets (MNIST, CIFAR-10, ImageNet) that have standard splits. However, it does not explicitly provide the specific percentages or counts for training, validation, and test dataset splits, nor does it explicitly refer to a separate validation set for hyperparameter tuning.
Hardware Specification	No	The paper does not specify any hardware used for running the experiments, such as specific GPU or CPU models.
Software Dependencies	No	The paper mentions optimizers like "Adam optimizer" and "SGD optimizer" but does not specify their version numbers or any other software dependencies with version information.
Experiment Setup	Yes	Using the Adam optimizer with an initial learning rate of 10-4 and batch size of 50 without dropout, Figure 4 shows training loss and test accuracy when Stanh=100. SGD optimizer is used with a momentum of 0.9 and a weight decay factor of 10 5. Initial learning rate is 0.1, which is decayed by 0.5 at the 150th and 175th epoch. The batch size is 128 and initial scaling factors of α are 0.2. Stanh starts from 5, and linearly increases to 10 using the same warmup schedule of the learning rate.