reproducibilityindex.ai

Invariant Rationalization

Authors: Shiyu Chang, Yang Zhang, Mo Yu, Tommi Jaakkola

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate INVRAT on multiple datasets with false correlations. The results show that INVRAT does signiﬁcantly better in removing false correlations and ﬁnding explanations that better align with human judgments. Our implementations are publicly available at https://github.com/code-terminator/ invariant_rationalization. 4. Experiments
Researcher Affiliation	Collaboration	1MIT-IBM Watson AI Lab 2IBM Research 3CSAIL MIT. Correspondence to: Shiyu Chang <shiyu.chang@ibm.com>, Yang Zhang <yang.zhang2@ibm.com>, Mo Yu <yum@us.ibm.com>.
Pseudocode	No	No structured pseudocode or algorithm blocks are present.
Open Source Code	Yes	Our implementations are publicly available at https://github.com/code-terminator/ invariant_rationalization.
Open Datasets	Yes	IMDB (Maas et al., 2011): The original dataset consists of 25,000 movie reviews for training and 25,000 for testing. Multi-aspect beer reviews (Mc Auley et al., 2012): This dataset is commonly used in the ﬁeld of rationalization (Lei et al., 2016; Bao et al., 2018; Yu et al., 2019; Chang et al., 2019).
Dataset Splits	Yes	For the purpose of model selection and evaluation, we randomly split the original test set into two balanced subsets, which are our new validation and test sets. The validation set is similarly sub-sampled into size 2,000.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) are provided for running experiments.
Software Dependencies	No	The paper mentions software components like 'bidirectional gated recurrent units', 'Glove embeddings', and 'Adam optimizer', but does not provide specific version numbers for any libraries, frameworks, or other software dependencies.
Experiment Setup	Yes	For all experiments, we use bidirectional gated recurrent units ... with hidden dimension 256... We use the Adam optimizer ... with a learning rate of 0.001. The batch size is set to 500. ...Hyperparameters (i.e., µ1, µ2 in equation (12) for the IMDB experiment, λ and h( ) in equation (8), the number of consecutive gradient ascent/descent steps for each player during one iteration, and the number of training epochs for both experiments) are determined based on the best performance on the validation set.