Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning
Authors: Xinyang Yi, Zhaoran Wang, Zhuoran Yang, Constantine Caramanis, Han Liu
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | we characterize the effect of α by establishing the information-theoretic and computational boundaries, namely, the minimax-optimal statistical accuracy that can be achieved by all algorithms, and polynomial-time algorithms under an oracle computational model. For small α, our result shows a gap between these two boundaries, which represents the computational price of achieving the information-theoretic boundary due to the lack of supervision. Interestingly, we also show that this gap narrows as α increases. In other words, having more supervision, i.e., more correct labels, not only improves the optimal statistical accuracy as expected, but also enhances the computational efficiency for achieving such accuracy. |
| Researcher Affiliation | Academia | Xinyang Yi Zhaoran Wang Zhuoran Yang Constantine Caramanis Han Liu The University of Texas at Austin Princeton University EMAIL EMAIL |
| Pseudocode | No | The paper defines test functions and query functions mathematically but does not include structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm 1'). |
| Open Source Code | No | The paper makes no mention of releasing or providing access to open-source code for the methodology described. |
| Open Datasets | No | The paper is theoretical, analyzing a 'Gaussian generative model' and working with 'n independent samples'. It does not refer to using or providing access information for a specific, publicly available dataset for training. |
| Dataset Splits | No | The paper is theoretical and does not discuss dataset splits, such as validation splits, for empirical reproducibility. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any software dependencies with specific version numbers needed for replication. |
| Experiment Setup | No | The paper is theoretical and does not include details on an experimental setup such as hyperparameters, training configurations, or system-level settings, as it does not describe empirical experiments. |