Semi-Supervised Matrix Completion for Cross-Lingual Text Classification

Authors: Min Xiao, Yuhong Guo

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the proposed learning technique, we conduct extensive experiments on eighteen cross language sentiment classification tasks with four different languages. The empirical results demonstrate the efficacy of the proposed approach, and show it outperforms a number of related cross-lingual learning methods.
Researcher Affiliation Academia Min Xiao and Yuhong Guo Department of Computer and Information Sciences Temple University Philadelphia, PA 19122, USA {minxiao, yuhong}@temple.edu
Pseudocode Yes Algorithm 1 Algorithm Input: M 0, γ > 0, β 1, 0 < < min(2, 2 β ), µ Initialize M as the nonnegative projection of the rank-1 approximation of M 0; initialize z as zeros. while not converged do 1. gradient descent: [M, z] = [M, z] rg(M, z). 2. shrinkage operation: [M, z] = S γ([M, z]). 3. project M onto the feasible set: M = max(M, 0). end while
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No We used the multilingual Amazon product review dataset in our experiments for cross-lingual sentiment classification, which contains reviews in three different categories (Books(B), DVD(D), and Music(M)), written in four different languages (English(E), French(F), German(G) and Japanese(J)). The paper does not provide a specific link, DOI, or a formal citation with author/year for accessing this dataset.
Dataset Splits Yes For each of the eighteen cross language sentiment classification tasks, in addition to the 2000 unlabeled parallel reviews which we used only for representation learning, we used all the documents in the source language as labeled data (4000 English reviews or 2000 non-English reviews) and randomly chose 100 reviews in the target language as labeled data while keeping the rest reviews in the target language as unlabeled data. We conducted parameter selection based on three runs over the first task EFB with different random selections of the 100 labeled training data in the target language.
Hardware Specification No No specific hardware details (like GPU/CPU models or specific computer configurations) used for running experiments are mentioned in the paper.
Software Dependencies No We used the LIBSVM package (Chang and Lin 2011) with linear kernels and default parameter setting. However, no specific version number for LIBSVM or other software dependencies is provided.
Experiment Setup Yes For SSMC, we chose γ from {0.01, 0.1, 1, 10, 100}, β from {1, 2, 5, 10, 100}, µ from {10 6, 10 5, 10 4, 10 3, 10 2, 10 1} and chose the reduced dimension size k from {20, 50, 100, 200, 500}. This leads to the following setting: γ = 10, β = 1, µ = 10 4, k = 50. We used = 1. For TSL, we set µ = 10 6, = 1, and chose γ from {0.01, 0.1, 1, 10, 100}, from {10 5, 10 4, 10 3, 10 2, 10 1, 1}, and the reduced dimension size k from {20, 50, 100, 200, 500}. This leads to the setting γ = 0.1, = 10 4, and k = 50.