Tri-net for Semi-Supervised Deep Learning
Authors: Dong-Dong Chen, Wei Wang, Wei Gao, Zhi-Hua Zhou
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our method achieves the best performance in comparison with state-of-the-art semi-supervised deep learning methods. |
| Researcher Affiliation | Academia | Dong-Dong Chen, Wei Wang, Wei Gao, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {chendd, wangw, gaow, zhouzh}@lamda.nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Tri-net Input: Labeled set L and unlabeled set U |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | We run experiments on three widely used benchmark datasets, i.e., MNIST, SVHN, and CIFAR-10. |
| Dataset Splits | No | The paper mentions using a 'standard data split for testing' but does not explicitly specify a distinct validation dataset split with percentages or counts, or refer to a standard validation split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., 'Python 3.8', 'TensorFlow 2.0') needed to replicate the experiment. |
| Experiment Setup | Yes | Parameters. In order to prevent the network from overfitting, we gradually increase the pool size N = 1000 2t up to the size of unlabeled data U [Saito et al., 2017], where t denotes the learning round. The maximal learning round T is set to be 30 in all experiments. We gradually decrease the confidence threshold σ after N = U to make more unlabeled data to be labeled (line 11, Algorithm 1)... We set σ0 = 0.999 and σos = 0.01 in MNIST; σ0 = 0.95 and σos = 0.25 in SVHN and CIFAR10. We use dropout (p = 0.5) after each max-pooling layer, use Leaky-Re LU (α = 0.1) as activate function except the FC layer, and use soft-max for FC layer. We also use Batch Normalization [Ioffe and Szegedy, 2015] for all layers except the FC layer. We use SGD with a mini-batch size of 16. The learning rate starts from 0.1 in initialization (from 0.02 in training) and is divided by 10 when the error plateaus. In initialization, three modules M1, M2 and M3 are trained for up to 300 epochs in SVHN and CIFAR-10 (100 in MNIST). In training, three modules M1, M2 and M3 are trained for up to 90 epochs in SVHN and CIFAR-10 (60 in MNIST). We set std = 0.05 in SVHN and CIFAR-10 (0.001 in MNSIT). We use a weight decay of 0.0001 and a momentum of 0.9. Following the setting in Laine and Aila [2016], we use ZCA, random crop and horizon flipping for CIFAR-10, zero-mean normalization and random crop for SVHN. |