Domain Generalization via Model-Agnostic Learning of Semantic Features

Authors: Qi Dou, Daniel Coelho de Castro, Konstantinos Kamnitsas, Ben Glocker

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task.
Researcher Affiliation Academia Biomedical Image Analysis Group, Imperial College London, UK {qi.dou,dc315,kk2412,b.glocker}@imperial.ac.uk
Pseudocode Yes Algorithm 1 Model-agnostic learning of semantic features for domain generalization
Open Source Code Yes Code for our proposed method is available at: https://github.com/biomedia-mira/masf.
Open Datasets Yes VLCS [8] is a classic benchmark for domain generalization... The PACS dataset [25] is a recent benchmark...
Dataset Splits Yes for leave-one-domain-out validation with randomly dividing each domain into 70% training and 30% test... we also use leave-one-domain-out cross-validation, i.e., training on three domains and testing on the remaining unseen one... We randomly split each domain to 80% for training and 20% for testing in experimental settings.
Hardware Specification Yes The batch size is 128 for each source domain, with an Nvidia TITAN Xp 12 GB GPU.
Software Dependencies No The paper mentions using Alex Net, Res Net, Adam optimizer, and UNet, but does not provide specific version numbers for software libraries or frameworks.
Experiment Setup Yes The triplet loss is adopted for computing Llocal, with coefficient β2 = 0.005, such that it is in a similar scale to Ltask and Lglobal (β1 = 1). We use the Adam optimizer [23] with η initialized to 10 3 and exponentially decayed by 2% every 1k iterations. For the inner optimization to obtain (ψ , θ ), we clip the gradients by norm (threshold by 2.0) to prevent them from exploding, since this step uses plain, non-adaptive gradient descent (with learning rate α = 10 5). We also employ an Adam optimizer for the meta-updates of φ with learning rate γ = 10 5 without decay. The batch size is 128 for each source domain...