Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the sample complexity of semi-supervised multi-objective learning

Authors: Tobias Wegel, Geelon So, Junhyung Park, Fanny Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [Yes] Justification: Our work is theoretical, and we only conduct toy experiments that are easily reproducible from the problem setups.
Researcher Affiliation	Academia	Tobias Wegel1 Geelon So2 Junhyung Park1 Fanny Yang1 1Department of Computer Science, ETH Zurich 2Department of Computer Science and Engineering, UC San Diego
Pseudocode	Yes	Algorithm 1 Pseudo-labeling (PL-MOL) 1: for k [K] do 2: Compute bhk = arg minh Hk b Rk(h) 3: end for 4: for s S do 5: Compute bgs = arg ming G bds(g; bh) 6: end for 7: Return {bgs : s S}.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The paper is theoretical. The only experiments that we run are detailed in Section 5 and Appendix C, and do not present a main contribution of this work. They can easily be reproduced from the description in these sections.
Open Datasets	No	Let X = [0, 1], Y = [0, 1] and let ℓk be the square loss. Define for 0 < LH < LG the function classes H = {h : [0, 1] [0, 1] : h is LH-Lipschitz} and G = {g : [0, 1] [0, 1] : g is LG-Lipschitz}. Furthermore, let K = 2 and P k X have a density pk on [0, 1] with respect to the Lebesgue measure. For Eq. (3) to hold, assume that there exist two functions f 1 , f 2 H for which E[Y k\|Xk = x] = f k(x) for all x [0, 1].
Dataset Splits	No	We sample n1 = n2 = 25 data points uniformly from the regions in Fig. 3a and, in Figs. 3b and 3c, label them according to the linear logistic model Y k\|Xk = x Ber(f k(x)), that is, with noise and in accordance with Eq. (3). Again, we run the three different algorithms on the logistic loss using linear scalarization: ERM-MOL (Algorithm 2) on the function class H of linear models, ERM-MOL on the function class G of linear models on polynomial features up to degree 5, and PL-MOL (Algorithm 1) using H in the first stage for all tasks, and G in the second stage with an additional number of N1 = N2 = 300 unlabeled data points. PL-MOL fits linear models to the labeled data and uses these to predict (soft) pseudo-labels for the unlabeled data, resulting in Fig. 4. Some resulting decision boundaries of each method are shown in Fig. 3b, and the Pareto fronts (on the test data) as well as excess s-trade-offs are shown in Fig. 3c.
Hardware Specification	Yes	Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: We only have have toy experiments (Figs. 2 and 3) which execute within a minutes on a standard laptop, as described in Section 5 and Appendix C.
Software Dependencies	No	The paper primarily focuses on theoretical contributions and does not specify any particular software libraries, frameworks, or their version numbers used for implementation or experimentation.
Experiment Setup	Yes	We sample n1 = n2 = 25 data points uniformly from the regions in Fig. 3a and, in Figs. 3b and 3c, label them according to the linear logistic model Y k\|Xk = x Ber(f k(x)), that is, with noise and in accordance with Eq. (3). Again, we run the three different algorithms on the logistic loss using linear scalarization: ERM-MOL (Algorithm 2) on the function class H of linear models, ERM-MOL on the function class G of linear models on polynomial features up to degree 5, and PL-MOL (Algorithm 1) using H in the first stage for all tasks, and G in the second stage with an additional number of N1 = N2 = 300 unlabeled data points. Figure 2: On the left: one fit of the methods on 5 labeled and 100 unlabeled samples with weights λ = (1/2, 1/2). In the center: excess s-trade-off as a function of labeled and unlabeled sample sizes for fixed weights λ = (1/2, 1/2). We fix the unlabeled and labeled sample sizes to 212 and 25, respectively. On the right: the excess s-trade-off of PL-MOL as a function of unlabeled sample size N1 while n1 = n2 = N2 = 25 are fixed, and for varying weights. We repeat each experiment 10 times and show median, 20% and 80% quantiles.