Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning

Authors: Yuyang Deng, Samory Kpotufe

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments with synthetic and real datasets support the theory. 1 Introduction In supervised transfer learning (STL), some amount of target data is to be complemented by a usually larger amount of related source data towards training a predictor. 6 Experiments In this section, we present the experimental results of our algorithm and the baseline algorithms on both synthetic and real-world datasets.
Researcher Affiliation	Academia	Yuyang Deng Columbia University, Statistics EMAIL Samory Kpotufe Columbia University, Statistics EMAIL
Pseudocode	Yes	Algorithm 1: Mixed-Sample SGD Input: θ0 = θQ,0 = 0, λ0 = 0, stepsize {αt}T 1 t=0 and η, γ, ϵQ. for t = 0, . . . , T 1 do Draw ξt Bernoulli( 1 1+λt ) if ξt = 1 then Sample (xt, yt) uniformly from SP θt+1 = θt η(1 + λt) ℓ(θt; xt, yt) end else Sample (xt, yt) uniformly from SQ θt+1 = θt η(1 + λt) ℓ(θt; xt, yt) end Sample (xt, yt) uniformly from SQ λt+1 = [(1 γη)λt + η(ℓ(θt; xt, yt) ℓ(θQ,t; xt, yt) 6ϵQ)]+ θQ,t+1 = θQ,t αt ℓ(θQ,t; xt, yt) end ˆθP Q = ℓ2 projection of 1 T PT 1 t=0 θt onto the constraint set n θ : ˆRQ(θ) ˆRQ(θQ,T ) 6ϵQ ϵ0 o . Output: ˆθP Q
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We do not make our code public at this moment. The data used in the experiments are open source data.
Open Datasets	Yes	6.2 Regression Task on the School Dataset. To demonstrate the performance of our method on real-world data, we conduct the experiments on the School Dataset [20]. 6.3 Regression Task on the Berkeley Yearbook Dataset. We conduct experiments on the Berkeley Yearbook Dataset [22]. 6.4 Binary Classification Task on the CIFAR-10 Dog vs Cat Dataset. At last, to demonstrate that our algorithm can work for general convex losses, we conduct the binary classification experiment on the CIFAR10 Dog vs Cat dataset [23]. 10.2 Binary Classification Results on the Malware Io T dataset Here we provide results on the CIC-IDS-2017 dataset [27].
Dataset Splits	Yes	6.1 Regression Task on the Synthetic Dataset. Throughout this subsection, we set the model dimension d to be 50. The distribution P and Q are set to be d-dimensional multivariate Gaussian with certain mean and covariance. The label is generated as y = θ µ x + ε for µ {P, Q}, where ε N(0, 1). We choose η = 10 4, and iteration number T = 2000 n P for both our method and HTL. We use closed form OLS solution to compute source and target ERM model. The results are demonstrated in Figure 1 and 2. In Figure 1 (Left), we fix n Q = 100 and vary n P from 100 to 1500. 6.2 Regression Task on the School Dataset. To demonstrate the performance of our method on real-world data, we conduct the experiments on the School Dataset [20]. Following [21], we use the data points from the first 100 schools as the source domain and the rest as the target. We set η = 10 4. We fix n Q = 100 and vary n P from 100 to 1500. 6.3 Regression Task on the Berkeley Yearbook Dataset. We conduct experiments on the Berkeley Yearbook Dataset [22]. The dataset contains the gray-scale portraits taken in different years. The input x is the 512-dimensional vector feature extracted by Res Net18, and y is the year the photo is taken (ranging from 1905 to 2013). We construct source and target tasks by varying the proportion of male and female photos. In the source dataset, 50% of the samples are drawn from male photos and 50% from female photos. For the target training and testing dataset, the ratio is adjusted to 75% male and 25% female. We choose η = 10 4 for all algorithms, and iteration number T = 2000 n P for both our method and HTL, and T = 500 n P for PSGD since we found it can converge in fewer epochs. We use closed form OLS solution to compute source and target ERM model. We fix n Q = 100 and vary n P from 500 to 1300. 6.4 Binary Classification Task on the CIFAR-10 Dog vs Cat Dataset. At last, to demonstrate that our algorithm can work for general convex losses, we conduct the binary classification experiment on the CIFAR10 Dog vs Cat dataset [23], using linear classifier and logistic loss. We construct source and target tasks by varying the proportion of dog and cat images. In the source dataset, the ratio is 50% dog and 50% cat. For the target training and testing dataset, the ratio is adjusted to 80% dog and 20% cat. We use SGD with stepsize 10 5 and epoch number 1000 to compute P and Q ERM model and HTL model. We choose stepsize for θ to be 10 5 and that for λ to be 10 3 , as well as T = 2000 n P in our method. We fix n Q = 50 and vary n P from 100 to 1500.
Hardware Specification	Yes	We implement our algorithm using Python on an Intel i7-8700 CPU.
Software Dependencies	No	We implement our algorithm using Python on an Intel i7-8700 CPU. We use the CVXPY [19] package to implement the projection step in Algorithm 1.
Experiment Setup	Yes	6.1 Regression Task on the Synthetic Dataset. Throughout this subsection, we set the model dimension d to be 50. The distribution P and Q are set to be d-dimensional multivariate Gaussian with certain mean and covariance. The label is generated as y = θ µ x + ε for µ {P, Q}, where ε N(0, 1). We choose η = 10 4, and iteration number T = 2000 n P for both our method and HTL. We use closed form OLS solution to compute source and target ERM model. 6.2 Regression Task on the School Dataset. ... We set η = 10 4. We fix n Q = 100 and vary n P from 100 to 1500. 6.3 Regression Task on the Berkeley Yearbook Dataset. ... We choose η = 10 4 for all algorithms, and iteration number T = 2000 n P for both our method and HTL, and T = 500 n P for PSGD since we found it can converge in fewer epochs. We use closed form OLS solution to compute source and target ERM model. 6.4 Binary Classification Task on the CIFAR-10 Dog vs Cat Dataset. ... We use SGD with stepsize 10 5 and epoch number 1000 to compute P and Q ERM model and HTL model. We choose stepsize for θ to be 10 5 and that for λ to be 10 3 , as well as T = 2000 n P in our method. We fix n Q = 50 and vary n P from 100 to 1500.