Learning with Marginalized Corrupted Features and Labels Together

Authors: Yingming Li, Ming Yang, Zenglin Xu, Zhongfei Zhang

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on three benchmark data sets demonstrate that RMCV outstands with a superior performance in comparison with state-of-the-art methods.
Researcher Affiliation Academia Yingming Li , Ming Yang , Zenglin Xu , and Zhongfei (Mark) Zhang School of Computer Science and Engineering, Big Data Research Center University of Electronic Science and Technology of China Department of Computer Science, State University of New York at Binghamton, NY, USA
Pseudocode Yes Algorithm 1: RMCV Algorithm
Open Source Code No The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets Yes All data sets are obtained from http://mulan.sourceforge.net/datasets-mlc.html. We have used three multi-label datasets, namely Bibtex, Bookmarks, and Enron for experimentation purpose. Their statistics is described in Table 2.
Dataset Splits Yes To find the optimal number of the stacked layers, we perform model selection on a hold-out validation set, adding layers until the F1 score cannot be improved. Since there is no fixed split in the Bookmarks data set in Mulan, we use a fixed training set of 80% of the data, and evaluate the performance of our predictions on the fixed test set of 20% of the data.
Hardware Specification No No specific hardware details (e.g., CPU/GPU models, memory, or cluster specifications) used for running experiments were mentioned in the paper.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes We follow the setup of (Chen, Zheng, and Weinberger 2013) and weigh each example in a tf-idflike fashion to give more weight on the losses from rare tags during training. The best performance tends to be achieved by RMCV with blankout corruption with high corruption levels, i.e., when q is at about 0.8.