reproducibilityindex.ai

Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation

Authors: Marius-Constantin Dinu, Markus Holzleitner, Maximilian Beck, Hoan Duc Nguyen, Andrea Huber, Hamid Eghbal-zadeh, Bernhard A. Moser, Sergei Pereverzyev, Sepp Hochreiter, Werner Zellinger

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also perform a large scale empirical comparative study on several datasets, including text, images, electroencephalogram, body sensor signals and signals from mobile phones. Our method1 outperforms deep embedded validation (DEV) and importance weighted validation (IWV) on all datasets, setting a new state-of-the-art performance for solving parameter choice issues in unsupervised domain adaptation with theoretical error guarantees.
Researcher Affiliation	Collaboration	1ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz 2Dynatrace Research 3Institute of Advanced Research in Artificial Intelligence 4Software Competence Center Hagenberg 5Johann Radon Institute for Computational and Applied Mathematics, Austrian Academy of Sciences
Pseudocode	Yes	Algorithm 1: Importance Weighted Least Squares Linear Aggregation (IWA).
Open Source Code	Yes	Large scale benchmark experiments are available at https://github.com/Xpitfire/iwa; dinu@ml.jku.at, werner.zellinger@ricam.oeaw.ac.at
Open Datasets	Yes	In addition, we perform extensive empirical evaluations on several datasets with academic data (Transformed Moons), text data (Amazon Reviews (Blitzer et al., 2006)), images (Mini Domain Net (Peng et al., 2019; Zellinger et al., 2021)), electroencephalography signals (Sleep-EDF (Eldele et al., 2021; Goldberger et al., 2000)), body sensor signals (UCI-HAR (Anguita et al., 2013), WISDM (Kwapisz et al., 2011)), and, sensor signals from mobile phones and smart watches (HHAR (Stisen et al., 2015)).
Dataset Splits	Yes	All datasets have a train, evaluation and test split, with results only presented on the held-out test sets. For additional details we refer to Appendix C and D. ... In particular, we use 4000 labeled source examples and 4000 unlabeled target examples for training, and over 1000 examples for testing.
Hardware Specification	Yes	Overall, to compute the results in our tables, we trained 16680 models with an approximate computational budget of 1500 GPU/hours on one high-performance computing station using 8 NVIDIA P100 16GB, 512GB RAM, 40 Cores Xeon(R) CPU E5-2698 v4 @ 2.20GHz on Cent OS Linux 7.
Software Dependencies	Yes	All methods have been implemented in Python using the Pytorch (Paszke et al., 2017, BSD license) library. For monitoring the runs we used Weights & Biases (Biewald, 2020, MIT license). We use Scikit-learn (Pedregosa et al., 2011) library for evaluation measures and toy datasets, and the TQDM (da Costa-Luis, 2019) library, and Tensorboard (Abadi et al., 2015) for keeping track of the progress of our experiments.
Experiment Setup	Yes	We train the class prediction models for 50 epochs and the domain classifier for 80 epochs with learning rate 0.001, weight decay 0.0001 and batchsize 128 using the Adam optimizer (Kingma & Ba, 2014). ... All class prediction models have been trained for 60 epochs and domain classifiers for 100 epochs with Adam optimizer, a learning rate of 0.001, β1 = 0.9, β2 = 0.999, batchsize of 128 and weight decay of 0.0001.