Semi-Supervised Multi-Modal Learning with Incomplete Modalities
Authors: Yang Yang, De-Chuan Zhan, Xiang-Rong Sheng, Yuan Jiang
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | experiments on 15 real world multi-modal datasets validate the effectiveness of our method. |
| Researcher Affiliation | Academia | Yang Yang, De-Chuan Zhan, Xiang-Rong Sheng, Yuan Jiang National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China {yangy, zhandc, shengxr, jiangy}@lamda.nju.edu.cn |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about the availability of open-source code for the described methodology, nor does it include any links to code repositories. |
| Open Datasets | Yes | Data Sets: In this paper, we conduct experiments on 7 two modalities datasets and 8 multiple modalities datasets. In detail, two modal datasets include: Movie dataset is extracted from IMDb, which has 617 movies of 17 genres, and there are two data matrices describing the same movies, i.e., keywords matrix and actors matrix. The main goal is to find the genre of the movies; Citeseer dataset [Sen et al., 2008] is originally made of 4 modalities, i.e., content, inbound, outbound, cites, on the same documents. We follow [Bisson and Grimal, 2012] to choose the content and cites modalities in our experiment. Web KB dataset [Sen et al., 2008] contains webpages collected from 4 universities: Cornell, Texas, Wisconsin and Washington, which have 5 categories, i.e., student, project, course, stuff and faculty. Multiple modal datasets include: News Group dataset [Bisson and Grimal, 2012] is of 6 groups extracted from the 20 Newsgroup datasets, i.e., M2, M5, M10, NG1, NG2, NG3. Every group contains 10 sub- sets, and we choose the first subset for all 6 groups in our experiment, i.e., News-M2, News-M5, News-M10, News-NG1, News-NG2 and News-NG3, respectively. 3-Source Text data (3Sources)(http://mlg.ucd.ie/datasets/3sources.html) is collected from three online news sources: BBC, Reuters, and Guardian. |
| Dataset Splits | Yes | For all datasets, we randomly select 70% for training and the remains are for test. For both the training set and test set. As in [Li et al., 2014], in each split, we randomly select 10% to 90% examples, with 20% as interval, as homogeneous examples with complete modality, and the remains are incomplete instances, i.e., in Web KB datasets, they are described by either the content or the citation modality, but not both. For all the examples, we randomly choose 30% as the labeled data, and the left 70% as unlabeled ones. In the training phase, the parameters λ1 and λ2 are selected by 5-fold cross validation from {10 5, 10 4, , 104, 105} with further splittings on the training datasets only, i.e., there is no overlap between the test set and the validation set for parameter picking up. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | In the training phase, the parameters λ1 and λ2 are selected by 5-fold cross validation from {10 5, 10 4, , 104, 105} with further splittings on the training datasets only, i.e., there is no overlap between the test set and the validation set for parameter picking up. Empirically, when the variations between the objective value of Eq. 9 is less than 10 6 in iteration, we treat SLIM converged. |