Domain Generalization via Model-Agnostic Learning of Semantic Features
Authors: Qi Dou, Daniel Coelho de Castro, Konstantinos Kamnitsas, Ben Glocker
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task. |
| Researcher Affiliation | Academia | Biomedical Image Analysis Group, Imperial College London, UK {qi.dou,dc315,kk2412,b.glocker}@imperial.ac.uk |
| Pseudocode | Yes | Algorithm 1 Model-agnostic learning of semantic features for domain generalization |
| Open Source Code | Yes | Code for our proposed method is available at: https://github.com/biomedia-mira/masf. |
| Open Datasets | Yes | VLCS [8] is a classic benchmark for domain generalization... The PACS dataset [25] is a recent benchmark... |
| Dataset Splits | Yes | for leave-one-domain-out validation with randomly dividing each domain into 70% training and 30% test... we also use leave-one-domain-out cross-validation, i.e., training on three domains and testing on the remaining unseen one... We randomly split each domain to 80% for training and 20% for testing in experimental settings. |
| Hardware Specification | Yes | The batch size is 128 for each source domain, with an Nvidia TITAN Xp 12 GB GPU. |
| Software Dependencies | No | The paper mentions using Alex Net, Res Net, Adam optimizer, and UNet, but does not provide specific version numbers for software libraries or frameworks. |
| Experiment Setup | Yes | The triplet loss is adopted for computing Llocal, with coefficient β2 = 0.005, such that it is in a similar scale to Ltask and Lglobal (β1 = 1). We use the Adam optimizer [23] with η initialized to 10 3 and exponentially decayed by 2% every 1k iterations. For the inner optimization to obtain (ψ , θ ), we clip the gradients by norm (threshold by 2.0) to prevent them from exploding, since this step uses plain, non-adaptive gradient descent (with learning rate α = 10 5). We also employ an Adam optimizer for the meta-updates of φ with learning rate γ = 10 5 without decay. The batch size is 128 for each source domain... |