Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Reproducible and Realistic Evaluation of Partial Domain Adaptation Methods

Authors: Tiago Salvador, Kilian FATRAS, Ioannis Mitliagkas, Adam M Oberman

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The main goal of this work is to provide a realistic evaluation of PDA methods under different model selection strategies and a consistent evaluation protocol. We evaluate 6 state-of-the-art PDA algorithms on 2 different real-world datasets using 7 different model selection strategies. Our two main findings are: (i) without target labels for model selection, the accuracy of the methods decreases up to 30 percentage points; (ii) only one method and model selection pair performs well on both datasets. Experiments were performed with our Py Torch framework, Benchmark PDA, which we open source.
Researcher Affiliation	Academia	Tiago Salvador EMAIL Mila Quebec AI Institute, Mc Gill University Kilian Fatras EMAIL Mila Quebec AI Institute, Mc Gill University Ioannis Mitliagkas EMAIL Mila Quebec AI Institute, Université de Montréal, Canada CIFAR AI Chair Adam Oberman EMAIL Mila Quebec AI Institute, Mc Gill University, Canada CIFAR AI Chair
Pseudocode	No	The paper describes algorithms and methods in text, but does not include any clearly labeled pseudocode or algorithm blocks. For example, Section 3 describes "Partial Domain Adaptation Methods" in narrative form.
Open Source Code	Yes	To perform our experiments we developed a Py Torch (Paszke et al., 2019) framework: Benchmark PDA. We make it available for researchers to use and contribute with new algorithms and model selection strategies: https://github.com/oberman-lab/Benchmark PDA
Open Datasets	Yes	Datasets. We consider two standard real-world datasets used in DA. Our first dataset is office-home (Venkateswara et al., 2017). It is a difficult dataset for unsupervised domain adaptation (UDA), it has 15,500 images from four different domains: Art (A), Clipart (C), Product (P) and Real-World (R). ... visda (Peng et al., 2017) is a large-scale dataset for UDA. It has 152,397 synthetic images and 55,388 real-world images, where 12 object categories are shared by these two domains.
Dataset Splits	Yes	Since s-acc, dev and snd require a source validation set, we divide the source samples into a training subset (80%) and validation subset (20%).
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments. It mentions using a Res Net50 backbone and PyTorch, but no specific GPU, CPU, or memory details.
Software Dependencies	No	The paper mentions "Py Torch (Paszke et al., 2019)" and "optimal transport solvers from (Flamary et al., 2021)", but it does not provide explicit version numbers for these software dependencies, which are required for a reproducible description.
Experiment Setup	Yes	Optimizer. We use the SGD (Robbins & Monro, 1951) algorithm with momentum of 0.9, a weight decay of 5e 4 and Nesterov acceleration. As the bottleneck and classifier layers are randomly initialized, we set their learning rates to be 10 times that of the pre-trained Res Net50 backbone. We schedule the learning rate with a strategy similar to the one in (Ganin et al., 2016): χp = χ0 (1+µi) ν , where i is the current iteration, χ0 = 0.001, γ = 0.001, ν = 0.75. ...Finally, as for the mini-batch size, jumbot and m-pot use stratified mini-batches of size 65 for office-home and 36 for visda. All other methods use a random uniform sampling strategy with a mini-batch size of 36. Hyper-Parameters. In Table 9, we report the values used for each hyper-parameter in our grid search. We report in Table 10 the hyper-parameters chosen by each model selection strategy for each method on both datasets.