Generalizing Across Domains via Cross-Gradient Training

Authors: Shiv Shankar*, Vihari Piratla*, Soumen Chakrabarti, Siddhartha Chaudhuri, Preethi Jyothi, Sunita Sarawagi

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation on three different applications where this setting is natural establishes that (1) domain-guided perturbation provides consistently better generalization to unseen domains, compared to generic instance perturbation methods, and that (2) data augmentation is a more stable and accurate method than domain adversarial training.
Researcher Affiliation Collaboration Shiv Shankar 1, Vihari Piratla 1, Soumen Chakrabarti1, Siddhartha Chaudhuri1,2, Preethi Jyothi1, and Sunita Sarawagi1 1Department of Computer Science, Indian Institute of Technology Bombay 2Adobe Research
Pseudocode Yes Algorithm 1 CROSSGRAD training pseudocode.
Open Source Code No The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement) for its source code.
Open Datasets Yes Character recognition across fonts. We created this dataset from Google Fonts3. (footnote 3: https://fonts.google.com/?category=Handwriting&subset=latin) ... Handwriting recognition across authors. We used the Lipi Tk dataset that comprises of handwritten characters from the Devanagari script4. (footnote 4: http://lipitk.sourceforge.net/hpl-datasets.htm) ... MNIST across synthetic domains. This dataset derived from MNIST was introduced by Ghifary et al. (2015). ... Spoken word recognition across users. We used the Google Speech Command Dataset5 (footnote 5: https://research.googleblog.com/2017/08/launching-speech-commands-dataset. html)
Dataset Splits Yes The data comprises of 109 fonts which are partitioned as 65% train, 25% test and 10% validation folds.
Hardware Specification Yes We gratefully acknowledge the support of NVIDIA Corporation with the donation of Titan X GPUs used for this research.
Software Dependencies No The paper mentions neural network architectures and optimizers but does not provide specific software library names with version numbers, such as TensorFlow 2.x or PyTorch 1.x, for their implementation.
Experiment Setup Yes For LABELGRAD the parameter α was chosen from {0.1, 0.25, 0.75, 0.5, 0.9} and for CROSSGRAD we chose αl = αd from the same set of values. We chose ϵ ranges so that L norm of the perturbations are of similar sizes in LABELGRAD and CROSSGRAD. The multiples in the ϵ range came from {0.5, 1, 2, 2.5}. The optimizer for the first three datasets is RMS prop with a learning rate (η) of 0.02 whereas for the last Speech dataset it is SGD with η = 0.001 initially and 0.0001 after 15 iterations.