Efficient Domain Generalization via Common-Specific Low-Rank Decomposition
Authors: Vihari Piratla, Praneeth Netrapalli, Sunita Sarawagi
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that CSD either matches or beats state of the art approaches for domain generalization based on domain erasure, domain perturbed data augmentation, and meta-learning. Further diagnostics on rotated MNIST, where domains are interpretable, confirm the hypothesis that CSD successfully disentangles common and domain specific components and hence leads to better domain generalization; moreover, our code and dataset are publicly available at the following URL: https: //github.com/vihari/csd. Section 4. Experiments. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Indian Institute of Technology, Mumbai, India 2Microsoft Research, Bangalore, India. |
| Pseudocode | Yes | Algorithm 1 Common-Specific Low-Rank Decomposition (CSD ) and Algorithm 2 minwc,Ws,Γ f(wc, Ws, Γ). |
| Open Source Code | Yes | moreover, our code and dataset are publicly available at the following URL: https: //github.com/vihari/csd. |
| Open Datasets | Yes | The Lipit K dataset2 earlier used in (Shankar et al., 2018) is a Devanagari Character dataset... (2) Nepali Hand Written Character Dataset (Nepali C)3 contains data collected from 41 different people... PACS4 is a popular domain generalization benchmark... Speech utterances dataset We use the utterance data released by Google... Rotated MNIST and Fashion-MNIST: Rotated MNIST is a popular benchmark for evaluating domain generalization. |
| Dataset Splits | Yes | We train three different models on each of 25, 50, and 76 domains, and test on a disjoint set of 20 domains while using 10 domains for validation. Since the number of available domains is small, in this case we create a fixed split of 27 domains for training, 5 for validation and remaining 9 for testing. We use ten percent of total number of domains for each of validation and test. |
| Hardware Specification | No | The paper mentions neural network architectures used (Res Net-18, Le Net) but does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | Yes | The base classifier and the preprocessing pipeline for the utterances are borrowed from the implementation provided in the Tensorflow examples6. (footnote 6 points to tensorflow/tree/r1.15) |
| Experiment Setup | Yes | CSD is relatively stable to hyper-parameter choice, we set the default rank to 1, and parameters of weighted loss to λ = 1 and κ = 1. These hyper-parameters along with learning rates of all other methods as well as number of meta-train/meta-test domains for MASF and step size of perturbation in CG are all picked using a task-specific development set. |