Counterfactual Maximum Likelihood Estimation for Training Deep Networks
Authors: Xinyi Wang, Wenhu Chen, Michael Saxon, William Yang Wang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on both simulated data and two real-world tasks: Natural Language Inference (NLI) and Image Captioning. The results show that CMLE methods outperform the regular MLE method in terms of out-of-domain generalization performance and reducing spurious correlations, while maintaining comparable performance on the regular evaluations. |
| Researcher Affiliation | Academia | Xinyi Wang, Wenhu Chen, Michael Saxon, William Yang Wang Department of Computer Science University of California, Santa Barbara xinyi_wang@ucsb.edu, wenhuchen@ucsb.edu, saxon@ucsb.edu, william@cs.ucsb.edu |
| Pseudocode | No | The paper describes algorithms but does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is released at https://github.com/WANGXinyi Linda/CMLE. |
| Open Datasets | Yes | We consider three large-scale NLI datasets: SNLI [23], MNLI [24] and ANLI [62]. The dataset we use is the MSCOCO 2014 dataset [66] which contains 123,287 images of common objects with 5 human-annotated captions per image. |
| Dataset Splits | Yes | SNLI and MNLI are considered standard benchmarks, with each containing 550,152/10,000/10,000 examples and 392,702/20,000/20,000 examples for train/dev/test split respectively. ANLI contains three subsets of increasing difficulty A1, A2, and A3 with about about 392,702/20,000/20,000 examples, 45,460/1,000/1,000 examples and 100,459/1,200/1,200 examples each for train/dev/test split respectively. We use the Karpathy [67] split with 113,287/5,000/5,000 images in the train/validation/test set respectively. |
| Hardware Specification | No | The paper discusses models and architectures (e.g., "Transformer architecture", "BART large model", "RoBERTa large model", "LXMERT") but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like "Hugging Face [64]" for models like "BART large model [63]" and "RoBERTa large model", but it does not specify version numbers for these libraries or frameworks, which is necessary for reproducibility. |
| Experiment Setup | Yes | We take α = 0.01 for Implicit CMLE and α = 0.1 for Explicit CMLE. We choose α = 0.003 for Implicit CMLE and α = 0.1 for Explicit CMLE. We choose α = 0.0002 for Implicit CMLE and α = 0.0001 for Explicit CMLE. |