reproducibilityindex.ai

C2L: Causally Contrastive Learning for Robust Text Classification

Authors: Seungtaek Choi, Myeongho Jeong, Hojae Han, Seung-won Hwang10526-10534

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results show that our approach, by collective decisions, is less sensitive to task model bias of attribution-based synthesis, and thus achieves signiﬁcant improvements, in diverse dimensions: 1) counterfactual robustness, 2) cross-domain generalization, and 3) generalization from scarce data. Experimental results show that our method improves the robustness in various dimensions: 1) counterfactual examples, 2) cross-domain, and 3) data scarcity.
Researcher Affiliation	Academia	1Yonsei University, Seoul, Republic of Korea 2Seoul National University, Seoul, Republic of Korea
Pseudocode	No	The paper describes methods in text and equations but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	No statement regarding open-source code availability or a link to a code repository was found in the paper.
Open Datasets	Yes	Sentiment analysis experiments on IMDb (Maas et al. 2011), Fine Food (Mc Auley and Leskovec 2013), and SST-2 (Socher et al. 2013) datasets. And, natural language inference experiments are conducted on Multi NLI (Williams, Nangia, and Bowman 2017) dataset.
Dataset Splits	Yes	Following the settings in (Kaushik, Hovy, and Lipton 2019; Moon et al. 2020), we use the ofﬁcial train and test splits if they exist, or we randomly divide the dataset with a 70:30 ratio, using them for train and test splits, respectively. To conﬁrm convergence, we use 10% of the train set for validation purposes.
Hardware Specification	No	No specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for experiments were mentioned in the paper.
Software Dependencies	No	The paper mentions software components like 'bert-base-uncased', 'Adam W', and 'RoBERTa' but does not provide specific version numbers for the programming language (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used to implement the methods and run experiments.
Experiment Setup	Yes	For experiments, we chose all the hyperparameters by the best performance on the validation set. For the BERT classiﬁer, we train bert-base-uncased with a batch size of 16 for SST2, and 8 for IMDb/FOOD over 3 epochs, ensuring convergence. We used Adam W with a learning rate of 5e 5 and the Linear scheduler with 50 warm-up steps. For contrastive objective (Eq. 5), the balancing coefﬁcient λ is tuned between [0.1, 1.0], and we observe giving larger λ is effective when fewer causal features are identiﬁed, such that setting the λ as 0.1, 0.7, and 1.0 performs well for SST2, IMDb, and Fine Food respectively. The number of positive/negative pairs J is set to 1 for the memory issue. The embedding dropout coefﬁcient λd is tuned to 0.5.