Invariant Rationalization

Authors: Shiyu Chang, Yang Zhang, Mo Yu, Tommi Jaakkola

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate INVRAT on multiple datasets with false correlations. The results show that INVRAT does significantly better in removing false correlations and finding explanations that better align with human judgments. Our implementations are publicly available at https://github.com/code-terminator/ invariant_rationalization. 4. Experiments
Researcher Affiliation Collaboration 1MIT-IBM Watson AI Lab 2IBM Research 3CSAIL MIT. Correspondence to: Shiyu Chang <shiyu.chang@ibm.com>, Yang Zhang <yang.zhang2@ibm.com>, Mo Yu <yum@us.ibm.com>.
Pseudocode No No structured pseudocode or algorithm blocks are present.
Open Source Code Yes Our implementations are publicly available at https://github.com/code-terminator/ invariant_rationalization.
Open Datasets Yes IMDB (Maas et al., 2011): The original dataset consists of 25,000 movie reviews for training and 25,000 for testing. Multi-aspect beer reviews (Mc Auley et al., 2012): This dataset is commonly used in the field of rationalization (Lei et al., 2016; Bao et al., 2018; Yu et al., 2019; Chang et al., 2019).
Dataset Splits Yes For the purpose of model selection and evaluation, we randomly split the original test set into two balanced subsets, which are our new validation and test sets. The validation set is similarly sub-sampled into size 2,000.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) are provided for running experiments.
Software Dependencies No The paper mentions software components like 'bidirectional gated recurrent units', 'Glove embeddings', and 'Adam optimizer', but does not provide specific version numbers for any libraries, frameworks, or other software dependencies.
Experiment Setup Yes For all experiments, we use bidirectional gated recurrent units ... with hidden dimension 256... We use the Adam optimizer ... with a learning rate of 0.001. The batch size is set to 500. ...Hyperparameters (i.e., µ1, µ2 in equation (12) for the IMDB experiment, λ and h( ) in equation (8), the number of consecutive gradient ascent/descent steps for each player during one iteration, and the number of training epochs for both experiments) are determined based on the best performance on the validation set.