Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Supervised Domain Adaptation Based on Marginal and Conditional Distributions Alignment

Authors: Ori Katz, Ronen Talmon, Uri Shaham

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, despite its generality, our approach demonstrates on-par or superior results compared with recent state-of-the-art task-specific methods. Our code is available here. 1 Introduction The Empirical Risk Minimization (ERM) principle, which is the theoretical basis of supervised machine learning, is based on the assumption that the training data is sampled from the data distribution that the model will encounter after deployment. [...] 6 Experimental Study Our experimental study begins in Section 6.1 with commonly-used benchmarks for classification. We compare our approach to state-of-the-art (SOTA) methods for classification and show that our approach obtains onpar or superior results. In Section 6.2, we focus on scenarios that highlight the benefits stemming from taking the geometry of the input and output spaces into account. We present new datasets and learning tasks and demonstrate superior results compared to the SOTA baseline methods used in Section 6.1. We complete this section with an evaluation of regression tasks in Section 6.3.
Researcher Affiliation	Academia	Ori Katz EMAIL Viterbi Faculty of Electrical and Computer Engineering Technion Israel Institute of Technology Ronen Talmon EMAIL Viterbi Faculty of Electrical and Computer Engineering Technion Israel Institute of Technology Uri Shaham EMAIL Department of Computer Science Bar-Ilan University
Pseudocode	Yes	Algorithm 1 Cross-domain samples generation via KR Input: 1. Two mapped features of mini-batches, one from each domain ϕ(Xv) Rd n, v {S, T}. 2. The labels for the mini-batch of the cross-domain Y v. Output: A mini-batch sample b Y v, where each y b Y v admits the distribution of e P v x, where x Qv. 1: Compute Dv, v[i, j] = ϕ(Xv i ) ϕ(X v j ) 2, for i, j = 1, . . . , n, where Xv i denotes the ith row of Xv (corresponding to the ith sample in the batch). 2: Compute: Kv, v[i, j] = exp D2 v, v[i,j] , where ϵv = maxi minj Dv, v[i, j]. 3: Approximate b Y v using kernel-based weighted average: b Y v = (diag (Kv, v1)) 1 Kv, v Y v, where 1 is a column vector of all ones.
Open Source Code	Yes	Experimentally, despite its generality, our approach demonstrates on-par or superior results compared with recent state-of-the-art task-specific methods. Our code is available here. [...] C Technical Details and Additional Results In this section, we provide more details and additional results to the experimental study in Section 6. Our code is publicly available here.
Open Datasets	Yes	The Digits task consists of two domains (datasets): the MNIST dataset Le Cun et al. (2010), denoted by M, and the USPS dataset Le Cun et al. (1989), denoted by U. The Office task Saenko et al. (2010) consists of 3 domains: Amazon, Webcam, and DSLR, denoted by A, W, and D, respectively. The Vis DA-C task Peng et al. (2018) consists of two domains: a synthetic domain of 3D rendered objects, denoted by S, and a real domain of images-in-the-wild, denoted by R. An illustration of the datasets is presented in Figure 2. [...] The City Cam dataset is taken from Zhang et al. (2017) and is publicly available11. [...] The Gasoline dataset is publicly available here 14, its parsed version is available here 15.
Dataset Splits	Yes	In this setting, each experiment is repeated 10 times. In each experiment, 200 samples per class from the source domain and X samples per class from the target domain are randomly selected as the training set, where X {1, 3, 5, 7}. [...] In this setting, each experiment is repeated 5 times. In each experiment, the samples for each class are sampled according to the quantities described in Section 6. [...] For each country, we perform the following procedure. We randomly pick 6 datapoints from their associated records to create the target training dataset, while the rest of its associated datapoints are used for testing. The source training dataset consists of all the datapoints from the rest of the countries. [...] The data partitioning is carried out according to the rectified experimental protocol suggested in Hedegaard et al. (2021) using their released Python packages 7 8.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It mentions using pre-trained models and network architectures but not the hardware they used for their own training/evaluation.
Software Dependencies	No	The paper mentions optimizer selection (Adam Kingma & Ba (2014) or SGD) and the Python package TLlib for data augmentations, but it does not specify version numbers for these software components or any other key libraries/frameworks.
Experiment Setup	Yes	Hyperparameters tuning was carried out manually based on the validation set for each experiment, and then, the same set of hyperparameters was used for all the methods. The selected hyperparameters for each experiment appear in our published code. The training loss in these experiments is the binary cross-entropy (BCE) loss. The CDCA term is applied by representing the labels as 1-hot vectors in RC, where C denotes the cardinality of the labels space. [...] No data augmentation was applied. The network architecture consists of two convolutional layers with a kernel size of 3 3 and 32 filters followed by max-pooling layers and 2 fully connected layers of sizes 120 and 84. [...] for the Office tasks we perform the default data augmentations used in the Python package TLlib 10. These augmentations are based on a random resized crop of size 224 224. The network architecture consists of the convolutional layers of a VGG-16 model Kobayashi et al. (2015) pre-trained on Image Net Russakovsky et al. (2015), followed by 2 fully connected layers of sizes 1024 and 128. [...] The training loss in the zero-shot and the colored digits experiments is the BCE loss, and in the multi-label experiment it is the sum of the BCE loss applied to each slot. [...] We used the same network architecture as in de Mathelin et al. (2021), which consists of a multi-layer perceptron with two hidden layers of size 100 and 10. The training loss is the MSE loss.