Scalable Semi-Supervised SVM via Triply Stochastic Gradients
Authors: Xiang Geng, Bin Gu, Xiang Li, Wanli Shi, Guansheng Zheng, Heng Huang
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on a variety of datasets demonstrate that TSGS3VM is much more efficient and scalable than existing S3VM algorithms. |
| Researcher Affiliation | Collaboration | 1School of Computer & Software, Nanjing University of Information Science & Technology, P.R.China 2JD Finance America Corporation 3Department of Electrical & Computer Engineering, University of Pittsburgh, USA 4Computer Science Department, University of Western Ontario, Canada |
| Pseudocode | Yes | Algorithm 1 TSGS3VM Train [...] Algorithm 2 TSGS3VM Predict |
| Open Source Code | No | The paper states, "We implemented the TSGS3VM algorithm in MATLAB." However, it does not provide a link to the code or explicitly state that the code is publicly available. |
| Open Datasets | Yes | Table 3 summarizes the 8 datasets used in our experiments. They are from LIBSVM3 and UCI4 repositories. [Footnote 3: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/] [Footnote 4: http://archive.ics.uci.edu/ml/datasets.html] |
| Dataset Splits | Yes | 5-fold cross-validation was used to determine the optimal settings (test error) of the model parameters (the regularization factor C and the Gaussian kernel parameter σ), the parameters C was set to C nl / nu . Specifically, the unlabeled dataset was divided evenly to 5 subsets, where one of the subsets and all the labeled data are used for training, while the other 4 subsets are used for testing. |
| Hardware Specification | Yes | We perform experiments on Intel Xeon E5-2696 machine with 48GB RAM. |
| Software Dependencies | No | The paper states, "We implemented the TSGS3VM algorithm in MATLAB." However, it does not specify a version number for MATLAB or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | The Gaussian RBF kernel k(x, x ) = exp( σ||x x ||2) and the loss function u = max{0, 1 |r|} was used for all algorithms. 5-fold cross-validation was used to determine the optimal settings (test error) of the model parameters (the regularization factor C and the Gaussian kernel parameter σ), the parameters C was set to C nl / nu . Parameter search was done on a 7 7 coarse grid linearly spaced in the region {log10 C, log10 σ)| 3 log10 C 3, 3 log10 σ 3} for all methods. For TSGS3VM, the step size γ equals 1 / η, where 0 log10 η 3 is searched after C and σ. Besides, the number of random features is set to be n and the batch size is set to 256. The test error was obtained by using these optimal model parameters for all the methods. To achieve a comparable accuracy to our TSGS3VM, we set the minimum budget sizes Bl and Bu as 100 and 0.2 nu respectively for BGS3VM. We stop TSGS3VM and BGS3VM after one pass over the entire dataset. We stop FRS3VM after 10 pass over the entire dataset to achieve a comparable accuracy. |