Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Heterogeneous Risk Minimization

Authors: Jiashuo Liu, Zheyuan Hu, Peng Cui, Bo Li, Zheyan Shen

ICML 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results validate the effectiveness of our HRM framework. ... Extensive experiments in both synthetic and real-world experiments datasets demonstrate the superiority of HRM in terms of average performance, stability performance as well as worst-case performance under different settings of distributional shifts.
Researcher Affiliation	Academia	1Department of Computer Science and Technology, Tsinghua University, Beijing, China; Email: EMAIL, EMAIL, EMAIL. 2School of Economics and Management, Tsinghua University, Beijing, China; Email: EMAIL. Correspondence to: Peng Cui <EMAIL>.
Pseudocode	No	The paper describes the algorithm steps in text and mathematical formulations but does not include a formal pseudocode block or an algorithm box.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	Car Insurance Prediction In this task, we use a real-world dataset for car insurance prediction (Kaggle). ... People Income Prediction In this task we use the Adult dataset (Dua & Graff, 2017) to predict personal income levels as above or below $50,000 per year based on personal details. ... House Price Prediction In this experiment, we use a real-world regression dataset (Kaggle) of house sales prices from King County, USA2.
Dataset Splits	Yes	In training, we generate sum = 2000 data points, where κ = 95% points from environment e1 with a predeﬁned r and 1 κ = 5% points from e2 with r = 1.1. In testing, we generate data points for 10 environments with r [ 3, 2.7, 2.3, . . . , 2.3, 2.7, 3.0]. ... In training phase, all methods are trained on pooled data including 693 points from environment 1 and 200 from environment 2, and validated on 100 sampled from both.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	For simplicity, we select data points according to a certain variable set Vb Ψ : vi Vb \|r\| 5 \|f(φ ) sign(r) vi\| (18) ... In training, we generate sum = 2000 data points, where κ = 95% points from environment e1 with a predeﬁned r and 1 κ = 5% points from e2 with r = 1.1. In testing, we generate data points for 10 environments with r [ 3, 2.7, 2.3, . . . , 2.3, 2.7, 3.0]. β is set to 1.0. We compare our HRM with ERM, DRO, EIIL and IRM for Linear Regression. ... In this experiment, we set β = 0.1 and build 10 environments with varying σ and the dimension of Φ , Ψ , the ﬁrst three for training and the last seven for testing. We run experiments for 10 times and the averaged results are shown in Table 3.