Long Short-Term Sample Distillation
Authors: Liang Jiang, Zujie Wen, Zhongping Liang, Yafang Wang, Gerard de Melo, Zhe Li, Liangzhuang Ma, Jiaxing Zhang, Xiaolong Li, Yuan Qi4345-4352
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method. |
| Researcher Affiliation | Collaboration | Liang Jiang,1 Zujie Wen,1 Zhongping Liang,1 Yafang Wang,1 Gerard de Melo,2 Zhe Li,1 Liangzhuang Ma,1 Jiaxing Zhang,1 Xiaolong Li,1 Yuan Qi1 1AI Department, Ant Financial Services Group, 2Rutgers University {tianxuan.jl, zujie.wzj, zhongping.lzp, yafang.wyf}@antfin.com, gdm@demelo.org |
| Pseudocode | Yes | Algorithm 1 Long Short-Term Sample Distillation |
| Open Source Code | No | The paper does not provide a statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | For vision, we evaluate LSTSD on the CIFAR100 dataset, which contains 60,000 RGB images of 32 32 size, split into a training set of 50,000 images and a testing set of 10,000 images. ... For NLP, we used the well-known GLUE benchmark data (Wang et al. 2019)... |
| Dataset Splits | No | The paper specifies a training set and a testing set for CIFAR100 ('split into a training set of 50,000 images and a testing set of 10,000 images'), but it does not explicitly detail a validation split or its size for any of the datasets used. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimizers like SGD and Adam but does not specify version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | Res Nets are trained for 164 epochs with a batch size of 128, while Dense Nets are trained for 300 epochs with a batch size of 64. We trained both Res Nets and Dense Nets using SGD with a weight decay of 0.0001, a Nesterov momentum of 0.9 and a base learning rate of 0.1, which was divided by 10 at the 25%, 50%, 75% of the training process. ... We found the best λS = 4.0, λL = 2.4 and length of mini-generation to 6 epochs for LSTSD... |