Invariant Random Forest: Tree-Based Model Solution for OOD Generalization

Authors: Yufan Liao, Qi Wu, Xing Yan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed method is motivated by a theoretical result under mild conditions, and validated by numerical tests with both synthetic and real datasets.
Researcher Affiliation Academia 1Institute of Statistics and Big Data, Renmin University of China 2School of Data Science, City University of Hong Kong
Pseudocode No The paper describes the proposed method in detail using text and mathematical equations but does not include a formal pseudocode block or algorithm figure.
Open Source Code No The paper does not provide any explicit statements about making the source code for their methodology publicly available, nor does it include a link to a code repository.
Open Datasets Yes Financial indicator dataset1 is a yearly stock return dataset. The features are the financial indicators of each stock, and the label we need to predict is the price going up or down in the next whole year. The span of this dataset is 5 years, and we treat each single year as an environment. (https://www.kaggle.com/datasets/cnic92/200-financial-indicators-of-us-stocks-20142018)
Dataset Splits Yes If there is a validation set, RF chooses the best maximum depth from {5, 10, 15} (for classification tasks) or {10, 15, 20} (for regression tasks). IRF uses the same maximum depth as RF and chooses the best λ from {0, 1, 5, 10}. All these hyperparameters are chosen using the cross-entropy loss (for classification tasks) or MSE (for regression tasks) on the validation set.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., programming language versions, library versions, or specific framework versions) used for implementing or running the experiments.
Experiment Setup Yes If there is no validation set, the maximum depths of RF, IRF, and XGBoost are fixed to 10 in classification tasks, and 20 in regression tasks. For each task, we run IRF with λ = 1, 5, 10 respectively. As for the number of trees, in the cases of Scenario 2 and Scenario 3 in real data regression tasks, the number of ensemble trees is fixed to 10 for RF and IRF. In all the other cases, the number of ensemble trees is fixed to 50 for RF and IRF. For XGBoost, the number of ensemble trees is fixed to 100 in all the experiments without change.