reproducibilityindex.ai

Invariant Random Forest: Tree-Based Model Solution for OOD Generalization

Authors: Yufan Liao, Qi Wu, Xing Yan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our proposed method is motivated by a theoretical result under mild conditions, and validated by numerical tests with both synthetic and real datasets.
Researcher Affiliation	Academia	1Institute of Statistics and Big Data, Renmin University of China 2School of Data Science, City University of Hong Kong
Pseudocode	No	The paper describes the proposed method in detail using text and mathematical equations but does not include a formal pseudocode block or algorithm figure.
Open Source Code	No	The paper does not provide any explicit statements about making the source code for their methodology publicly available, nor does it include a link to a code repository.
Open Datasets	Yes	Financial indicator dataset1 is a yearly stock return dataset. The features are the ﬁnancial indicators of each stock, and the label we need to predict is the price going up or down in the next whole year. The span of this dataset is 5 years, and we treat each single year as an environment. (https://www.kaggle.com/datasets/cnic92/200-ﬁnancial-indicators-of-us-stocks-20142018)
Dataset Splits	Yes	If there is a validation set, RF chooses the best maximum depth from {5, 10, 15} (for classiﬁcation tasks) or {10, 15, 20} (for regression tasks). IRF uses the same maximum depth as RF and chooses the best λ from {0, 1, 5, 10}. All these hyperparameters are chosen using the cross-entropy loss (for classiﬁcation tasks) or MSE (for regression tasks) on the validation set.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., programming language versions, library versions, or specific framework versions) used for implementing or running the experiments.
Experiment Setup	Yes	If there is no validation set, the maximum depths of RF, IRF, and XGBoost are ﬁxed to 10 in classiﬁcation tasks, and 20 in regression tasks. For each task, we run IRF with λ = 1, 5, 10 respectively. As for the number of trees, in the cases of Scenario 2 and Scenario 3 in real data regression tasks, the number of ensemble trees is ﬁxed to 10 for RF and IRF. In all the other cases, the number of ensemble trees is ﬁxed to 50 for RF and IRF. For XGBoost, the number of ensemble trees is ﬁxed to 100 in all the experiments without change.