Counterfactual Maximum Likelihood Estimation for Training Deep Networks

Authors: Xinyi Wang, Wenhu Chen, Michael Saxon, William Yang Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on both simulated data and two real-world tasks: Natural Language Inference (NLI) and Image Captioning. The results show that CMLE methods outperform the regular MLE method in terms of out-of-domain generalization performance and reducing spurious correlations, while maintaining comparable performance on the regular evaluations.
Researcher Affiliation Academia Xinyi Wang, Wenhu Chen, Michael Saxon, William Yang Wang Department of Computer Science University of California, Santa Barbara xinyi_wang@ucsb.edu, wenhuchen@ucsb.edu, saxon@ucsb.edu, william@cs.ucsb.edu
Pseudocode No The paper describes algorithms but does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is released at https://github.com/WANGXinyi Linda/CMLE.
Open Datasets Yes We consider three large-scale NLI datasets: SNLI [23], MNLI [24] and ANLI [62]. The dataset we use is the MSCOCO 2014 dataset [66] which contains 123,287 images of common objects with 5 human-annotated captions per image.
Dataset Splits Yes SNLI and MNLI are considered standard benchmarks, with each containing 550,152/10,000/10,000 examples and 392,702/20,000/20,000 examples for train/dev/test split respectively. ANLI contains three subsets of increasing difficulty A1, A2, and A3 with about about 392,702/20,000/20,000 examples, 45,460/1,000/1,000 examples and 100,459/1,200/1,200 examples each for train/dev/test split respectively. We use the Karpathy [67] split with 113,287/5,000/5,000 images in the train/validation/test set respectively.
Hardware Specification No The paper discusses models and architectures (e.g., "Transformer architecture", "BART large model", "RoBERTa large model", "LXMERT") but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions software components like "Hugging Face [64]" for models like "BART large model [63]" and "RoBERTa large model", but it does not specify version numbers for these libraries or frameworks, which is necessary for reproducibility.
Experiment Setup Yes We take α = 0.01 for Implicit CMLE and α = 0.1 for Explicit CMLE. We choose α = 0.003 for Implicit CMLE and α = 0.1 for Explicit CMLE. We choose α = 0.0002 for Implicit CMLE and α = 0.0001 for Explicit CMLE.