C2L: Causally Contrastive Learning for Robust Text Classification
Authors: Seungtaek Choi, Myeongho Jeong, Hojae Han, Seung-won Hwang10526-10534
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results show that our approach, by collective decisions, is less sensitive to task model bias of attribution-based synthesis, and thus achieves significant improvements, in diverse dimensions: 1) counterfactual robustness, 2) cross-domain generalization, and 3) generalization from scarce data. Experimental results show that our method improves the robustness in various dimensions: 1) counterfactual examples, 2) cross-domain, and 3) data scarcity. |
| Researcher Affiliation | Academia | 1Yonsei University, Seoul, Republic of Korea 2Seoul National University, Seoul, Republic of Korea |
| Pseudocode | No | The paper describes methods in text and equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | No statement regarding open-source code availability or a link to a code repository was found in the paper. |
| Open Datasets | Yes | Sentiment analysis experiments on IMDb (Maas et al. 2011), Fine Food (Mc Auley and Leskovec 2013), and SST-2 (Socher et al. 2013) datasets. And, natural language inference experiments are conducted on Multi NLI (Williams, Nangia, and Bowman 2017) dataset. |
| Dataset Splits | Yes | Following the settings in (Kaushik, Hovy, and Lipton 2019; Moon et al. 2020), we use the official train and test splits if they exist, or we randomly divide the dataset with a 70:30 ratio, using them for train and test splits, respectively. To confirm convergence, we use 10% of the train set for validation purposes. |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions software components like 'bert-base-uncased', 'Adam W', and 'RoBERTa' but does not provide specific version numbers for the programming language (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used to implement the methods and run experiments. |
| Experiment Setup | Yes | For experiments, we chose all the hyperparameters by the best performance on the validation set. For the BERT classifier, we train bert-base-uncased with a batch size of 16 for SST2, and 8 for IMDb/FOOD over 3 epochs, ensuring convergence. We used Adam W with a learning rate of 5e 5 and the Linear scheduler with 50 warm-up steps. For contrastive objective (Eq. 5), the balancing coefficient λ is tuned between [0.1, 1.0], and we observe giving larger λ is effective when fewer causal features are identified, such that setting the λ as 0.1, 0.7, and 1.0 performs well for SST2, IMDb, and Fine Food respectively. The number of positive/negative pairs J is set to 1 for the memory issue. The embedding dropout coefficient λd is tuned to 0.5. |