Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Information-Theoretic Multi-view Domain Adaptation: A Theoretical and Empirical Study

Authors: P. Yang, W. Gao

JAIR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we empirically evaluate the IMAM algorithm for the cross-domain document classiﬁcation tasks in comparison with the state-of-the-art baselines.
Researcher Affiliation	Academia	Pei Yang EMAIL South China University of Technology Guangzhou, China Wei Gao EMAIL Qatar Computing Research Institute Qatar Foundation, Doha, Qatar
Pseudocode	Yes	Algorithm 1 Algorithm for IMAM Input: Document-term matrices DS W and DT W; Document-link matrices DS L and DT L; Class label c C assigned to each doc d DS; # of document clusters (i.e., # of classes); Output: Class label assigned to each document d DT ; 1: Set t = 0. Initialize document clustering C(0) D using NBC. Initialize word clustering C(0) W and link clustering C(0) L randomly; 2: Initialize distributions q(0)(w\| ˆd), q(0)(l\| ˆd), q(0)(d\| ˆw), q(0)(d\|ˆl), q(0)(c\| ˆw), q(0)(c\|ˆl); 3: repeat 4: Document clustering: For each d, ﬁnd its new cluster index using Eq. 4; 5: Keep q(t+1)(c\| ˆw) = q(t)(c\| ˆw) and q(t+1)(c\|ˆl) = q(t)(c\|ˆl); Update q(t+1)(w\| ˆd), q(t+1)(l\| ˆd), q(t+1)(d\| ˆw), q(t+1)(d\|ˆl); 6: Word clustering: For each word w, ﬁnd its new cluster index using Eq. 5; Link clustering: For each link l, ﬁnd its new cluster index using Eq. 6; 7: Update q(t+2)(w\| ˆd), q(t+2)(l\| ˆd), q(t+2)(d\| ˆw), q(t+2)(d\|ˆl), q(t+2)(c\| ˆw) and q(t+2)(c\|ˆl); 8: t = t + 2; 9: until no document s cluster index needs to adjust 10: for each unlabeled d DT do 11: Assign d the class label based on Eq. 7; 12: end for
Open Source Code	No	The paper references source code for third-party tools used for comparison (TSVM at "http://svmlight.joachims.org/" and CODA at "http://www1.cse.wustl.edu/~mchen/code/coda.tar") but does not provide a statement or link for the authors' own implementation of IMAM.
Open Datasets	Yes	Cora (Mc Callum, Nigam, Rennie, & Seymore, 2000) is an online archive which contains approximately 37,000 computer science research papers and over 1 million links among documents. [...] Reuters-21578 (Lewis, 2004) is widely used for the evaluation of automatic text categorization algorithms. Reuters-21578 corpus also has a hierarchical structure, which contains 5 top categories. We used the pre-processed version of the corpus that is public accessible3. (Footnote 3: http://www.cse.ust.hk/TL/dataset/Reuters.zip.)
Dataset Splits	Yes	Based on this dataset, we used a similar way as Dai et al. (2007a) to construct our training and test sets. For each set, we chose two top categories, one as positive class and the other as the negative. Different sub-categories were deemed as different domains. The task is deﬁned as top category classiﬁcation. For example, the subset denoted as DA-EC consists of source domain: DA 1(+), EC 1(-); and target domain: DA 2(+), EC 2(-). [...] For each algorithm, the parameters were tuned by using ﬁve-fold cross-validation on training data.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions techniques like TF-IDF and pLSA, and tools like TSVM and CODA, but does not provide specific version numbers for any software libraries or dependencies used in their own implementation.
Experiment Setup	Yes	Figure 2 shows the error rate curves varying with different number of word (and link) clusters on the 4 subsets: DA-EC, DA-NT, DA-OS and EC-NT. The X-axis represents the number of word (and link) clusters which is tuned from 32 to 512. According to the performance shown in the ﬁgure, we empirically set the number of word (and link) clusters to 128. [...] Figure 3 shows that the performance curves vary with different values of α. [...] in the remaining experiments, we set the value of α to 0.7. [...] We empirically set λ to 0.5 after trying 0, 0.25, 0.5, 1, 2 and 4.