Generalizing Across Domains via Cross-Gradient Training
Authors: Shiv Shankar*, Vihari Piratla*, Soumen Chakrabarti, Siddhartha Chaudhuri, Preethi Jyothi, Sunita Sarawagi
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation on three different applications where this setting is natural establishes that (1) domain-guided perturbation provides consistently better generalization to unseen domains, compared to generic instance perturbation methods, and that (2) data augmentation is a more stable and accurate method than domain adversarial training. |
| Researcher Affiliation | Collaboration | Shiv Shankar 1, Vihari Piratla 1, Soumen Chakrabarti1, Siddhartha Chaudhuri1,2, Preethi Jyothi1, and Sunita Sarawagi1 1Department of Computer Science, Indian Institute of Technology Bombay 2Adobe Research |
| Pseudocode | Yes | Algorithm 1 CROSSGRAD training pseudocode. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement) for its source code. |
| Open Datasets | Yes | Character recognition across fonts. We created this dataset from Google Fonts3. (footnote 3: https://fonts.google.com/?category=Handwriting&subset=latin) ... Handwriting recognition across authors. We used the Lipi Tk dataset that comprises of handwritten characters from the Devanagari script4. (footnote 4: http://lipitk.sourceforge.net/hpl-datasets.htm) ... MNIST across synthetic domains. This dataset derived from MNIST was introduced by Ghifary et al. (2015). ... Spoken word recognition across users. We used the Google Speech Command Dataset5 (footnote 5: https://research.googleblog.com/2017/08/launching-speech-commands-dataset. html) |
| Dataset Splits | Yes | The data comprises of 109 fonts which are partitioned as 65% train, 25% test and 10% validation folds. |
| Hardware Specification | Yes | We gratefully acknowledge the support of NVIDIA Corporation with the donation of Titan X GPUs used for this research. |
| Software Dependencies | No | The paper mentions neural network architectures and optimizers but does not provide specific software library names with version numbers, such as TensorFlow 2.x or PyTorch 1.x, for their implementation. |
| Experiment Setup | Yes | For LABELGRAD the parameter α was chosen from {0.1, 0.25, 0.75, 0.5, 0.9} and for CROSSGRAD we chose αl = αd from the same set of values. We chose ϵ ranges so that L norm of the perturbations are of similar sizes in LABELGRAD and CROSSGRAD. The multiples in the ϵ range came from {0.5, 1, 2, 2.5}. The optimizer for the first three datasets is RMS prop with a learning rate (η) of 0.02 whereas for the last Speech dataset it is SGD with η = 0.001 initially and 0.0001 after 15 iterations. |