Learning Robust Representations by Projecting Superficial Statistics Out
Authors: Haohan Wang, Zexue He, Zachary C. Lipton, Eric P. Xing
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our method on battery of standard domain generalization data sets and, interestingly, achieve comparable or better performance as compared to other domain generalization methods that explicitly require samples from the target distribution for training. To show the effectiveness of our proposed method, we conduct range of experiments, |
| Researcher Affiliation | Academia | Haohan Wang Carnegie Mellon University Pittsburgh, PA USA haohanw@cs.cmu.edu Zexue He Beijing Normal University Beijing, China zexueh@mail.bnu.edu.cn Zachary C. Lipton Carnegie Mellon University Pittsburgh, PA, USA zlipton@cmu.edu Eric P. Xing Carnegie Mellon University Pittsburgh, PA, USA epxing@cs.cmu.edu |
| Pseudocode | No | The paper describes methods textually and with mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include a statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We trained the network with a mixture of four digit recognition data sets: MNIST (Le Cun et al., 1998), SVHN (Netzer et al., 2011), MNIST-M (Ganin & Lempitsky, 2014), and USPS (Denker et al., 1989). We generated a synthetic data set extending the Facial Expression Research Group Database (Aneja et al., 2016). We used the extracted SURF (Bay et al., 2006) features (800 dimension) and GLCM (Lam, 1996) features (256 dimension) from office data set (Saenko et al., 2010). We experimented with the MNIST-rotation data set...PACS data set (Li et al., 2017a). |
| Dataset Splits | Yes | In the training set (50% of the data) and validation set (30% of the data), the background is correlated with the sentiment label with a correlation of ρ; in testing set (the rest 20% of the data), the background is independent of the sentiment label. |
| Hardware Specification | No | Our heuristic training procedure allows us to tune the Alex Net with only 10 epoches and train the toplayer classifier 100 epochs (roughly only 600 seconds on our server for each testing case). The paper mentions "our server" but provides no specific details such as GPU models, CPU models, or memory. |
| Software Dependencies | No | The paper mentions "ADAM (Kingma & Ba, 2014)" as an optimizer and "Alex Net" as a baseline model, but it does not specify versions for any other software components, libraries, or programming languages used. |
| Experiment Setup | Yes | We chose to run 100 epochs with learning rate 5e-4 because this is when the CNN can converge for all these 10 synthetic datasets. We trained and validated the model on a mixture of two and tested on the third one. We used the same learning rate scheduling strategy as in the previous experiment. We first fine-tuned the Alex Net pretrained on Image Net with PACS data of training domains without plugging in NGLCM and HEX, then we used HEX and NGLCM to further train the top classifier of Alex Net while the weights of the bottom layer are fixed. Our heuristic training procedure allows us to tune the Alex Net with only 10 epoches and train the toplayer classifier 100 epochs. |