C2L: Causally Contrastive Learning for Robust Text Classification

Authors: Seungtaek Choi, Myeongho Jeong, Hojae Han, Seung-won Hwang10526-10534

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results show that our approach, by collective decisions, is less sensitive to task model bias of attribution-based synthesis, and thus achieves significant improvements, in diverse dimensions: 1) counterfactual robustness, 2) cross-domain generalization, and 3) generalization from scarce data. Experimental results show that our method improves the robustness in various dimensions: 1) counterfactual examples, 2) cross-domain, and 3) data scarcity.
Researcher Affiliation Academia 1Yonsei University, Seoul, Republic of Korea 2Seoul National University, Seoul, Republic of Korea
Pseudocode No The paper describes methods in text and equations but does not include structured pseudocode or algorithm blocks.
Open Source Code No No statement regarding open-source code availability or a link to a code repository was found in the paper.
Open Datasets Yes Sentiment analysis experiments on IMDb (Maas et al. 2011), Fine Food (Mc Auley and Leskovec 2013), and SST-2 (Socher et al. 2013) datasets. And, natural language inference experiments are conducted on Multi NLI (Williams, Nangia, and Bowman 2017) dataset.
Dataset Splits Yes Following the settings in (Kaushik, Hovy, and Lipton 2019; Moon et al. 2020), we use the official train and test splits if they exist, or we randomly divide the dataset with a 70:30 ratio, using them for train and test splits, respectively. To confirm convergence, we use 10% of the train set for validation purposes.
Hardware Specification No No specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for experiments were mentioned in the paper.
Software Dependencies No The paper mentions software components like 'bert-base-uncased', 'Adam W', and 'RoBERTa' but does not provide specific version numbers for the programming language (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used to implement the methods and run experiments.
Experiment Setup Yes For experiments, we chose all the hyperparameters by the best performance on the validation set. For the BERT classifier, we train bert-base-uncased with a batch size of 16 for SST2, and 8 for IMDb/FOOD over 3 epochs, ensuring convergence. We used Adam W with a learning rate of 5e 5 and the Linear scheduler with 50 warm-up steps. For contrastive objective (Eq. 5), the balancing coefficient λ is tuned between [0.1, 1.0], and we observe giving larger λ is effective when fewer causal features are identified, such that setting the λ as 0.1, 0.7, and 1.0 performs well for SST2, IMDb, and Fine Food respectively. The number of positive/negative pairs J is set to 1 for the memory issue. The embedding dropout coefficient λd is tuned to 0.5.