Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Fundamental Limits and Tradeoffs in Invariant Representation Learning

Authors: Han Zhao, Chen Dan, Bryon Aragam, Tommi S. Jaakkola, Geoffrey J. Gordon, Pradeep Ravikumar

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Although our contributions are mainly theoretical, a key practical application of our results is in certifying the potential sub-optimality of any given representation learning algorithm for either classification or regression tasks. Our results shed new light on the fundamental interplay between accuracy and invariance, and may be useful in guiding the design of future representation learning algorithms. 6. Numerical Experiments In this section, we demonstrate the empirical application of our theoretical results in both classification and regression tasks to certify the suboptimality of certain representation learning algorithms. To this end, we conduct experiments on two real-world benchmark datasets, the UCI Adult dataset (Asuncion and Newman, 2007) for classification and the Law School dataset (Wightman, 1998) for regression.
Researcher Affiliation	Academia	Han Zhao EMAIL University of Illinois Urbana-Champaign Chen Dan EMAIL Carnegie Mellon University Bryon Aragam EMAIL University of Chicago Tommi S. Jaakkola EMAIL Massachusetts Institute of Technology Geoffrey J. Gordon EMAIL Carnegie Mellon University Pradeep Ravikumar EMAIL Carnegie Mellon University
Pseudocode	No	The paper describes mathematical derivations and algorithms in prose but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets	Yes	6.1 Datasets UCI Adult The Adult dataset contains 30,162/15,060 training/test instances for income prediction. Each instance in the dataset describes an adult from the 1994 US Census. Attributes of each individual include gender, education level, age, etc. In this experiment we use gender (binary) as the protected attribute, and we preprocess the dataset to convert all the categorical variables into corresponding one-hot representations. The processed data contains 114 attributes. The target variable (income) is ZHAO, DAN, ARAGAM, JAKKOLA, GORDON AND RAVIKUMAR also binary: 1 if >50K/year otherwise 0. For the protected attribute A, A = 0 means Male otherwise Female. In this dataset, the base rates across groups are different: Pr(Y = 1 \| A = 0) = 0.310 while Pr(Y = 1 \| A = 1) = 0.113. The group ratio is also quite imbalanced, with Pr(A = 0) = 0.673 and Pr(A = 1) = 0.327. Law School The Law School dataset contains 1,823 records for law students who took the bar passage study for Law School Admission.3 The features in the dataset include variables such as undergraduate GPA, LSAT score, full-time status, family income, gender, etc. In this experiment, we use gender (treated as a continuous variable that takes value in [0, 1]) as the protected attribute and undergraduate GPA (continuous) as the target variable. For both variables, we use the mean-squared error as the loss function. We use 80 percent of the data as our training set and the rest 20 percent as the test set. In the Law School dataset, Pr(A = 1) = 0.452, which is quite balanced. The data distribution for different subgroups in the Law School dataset could be found in Figure 5. From Figure 5, we can see that the conditional distributions Pr(Y \| A = a) are different across different subgroups A = a ∈ {0, 1}. 3. We use the edited public version of the dataset which can be downloaded here: https://github.com/ algowatchpenn/GerryFair/blob/master/dataset/lawschool.csv
Dataset Splits	Yes	UCI Adult The Adult dataset contains 30,162/15,060 training/test instances for income prediction. Law School The Law School dataset... We use 80 percent of the data as our training set and the rest 20 percent as the test set.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper describes the methods used (logistic regression, OLS, MLP, adversarial training) but does not specify any software libraries or frameworks with their version numbers.
Experiment Setup	No	The paper describes the types of models used (logistic regression, OLS, MLP, adversarial training) and mentions optimization with stochastic gradient descent for some, but it does not provide specific hyperparameters such as learning rates, batch sizes, number of epochs, or other detailed training configurations.