FleXOR: Trainable Fractional Quantization
Authors: Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, Jeongin Yun
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experiments using MNIST, CIFAR-10, and Image Net to show that inserting XOR gates learns quantization/encrypted bit decisions through training and obtains high accuracy even for fractional sub 1-bit weights. |
| Researcher Affiliation | Industry | Samsung Research, Seoul, Republic of Korea {dongsoo3.lee, sejung0.kwon, byeonguk.kim, dragwon.jeon, bpbs.park, ji6373.yun}@samsung.com |
| Pseudocode | Yes | Algorithm 1: Pseudo codes of Conv Layer with Fle XOR when the kernel size is k k, the number of input channel and output channel are Cin and Cout, respectively. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We perform experiments using MNIST, CIFAR-10, and Image Net |
| Dataset Splits | No | The paper mentions training loss and test accuracy, and uses well-known datasets (MNIST, CIFAR-10, ImageNet) that have standard splits. However, it does not explicitly provide the specific percentages or counts for training, validation, and test dataset splits, nor does it explicitly refer to a separate validation set for hyperparameter tuning. |
| Hardware Specification | No | The paper does not specify any hardware used for running the experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper mentions optimizers like "Adam optimizer" and "SGD optimizer" but does not specify their version numbers or any other software dependencies with version information. |
| Experiment Setup | Yes | Using the Adam optimizer with an initial learning rate of 10-4 and batch size of 50 without dropout, Figure 4 shows training loss and test accuracy when Stanh=100. SGD optimizer is used with a momentum of 0.9 and a weight decay factor of 10 5. Initial learning rate is 0.1, which is decayed by 0.5 at the 150th and 175th epoch. The batch size is 128 and initial scaling factors of α are 0.2. Stanh starts from 5, and linearly increases to 10 using the same warmup schedule of the learning rate. |