Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Risk Bounds for Positive-Unlabeled Learning Under the Selected At Random Assumption

Authors: Olivier Coudray, Christine Keribin, Pascal Massart, Patrick Pamphile

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we are interested in establishing risk bounds for PU learning under this general assumption. In addition, we quantify the impact of label noise on PU learning compared to the standard classification setting. Finally, we provide a lower bound on the minimax risk proving that the upper bound is almost optimal.
Researcher Affiliation Collaboration Olivier Coudray EMAIL Stellantis, Centre d Expertise M etier et R egion, Poissy, 78300, France Universit e Paris-Saclay, CNRS, Inria, Laboratoire de math ematiques d Orsay, Orsay, 91405, France Christine Keribin EMAIL Universit e Paris-Saclay, CNRS, Inria, Laboratoire de math ematiques d Orsay, Orsay, 91405, France Pascal Massart EMAIL Universit e Paris-Saclay, CNRS, Inria, Laboratoire de math ematiques d Orsay, Orsay, 91405, France Patrick Pamphile EMAIL Universit e Paris-Saclay, CNRS, Inria, Laboratoire de math ematiques d Orsay, Orsay, 91405, France
Pseudocode No The paper focuses on theoretical analysis, establishing risk bounds, and providing lower bounds on minimax risk without presenting any explicit pseudocode or algorithm blocks. The methods are described mathematically.
Open Source Code No The paper discusses theoretical aspects of PU learning, establishing risk bounds and minimax risks, but does not contain any statement about open-sourcing code, providing a repository link, or including code in supplementary materials.
Open Datasets No The paper provides theoretical analysis of PU learning. It mentions applications and related works that use various datasets (e.g., spam review detection, text classification, gene-disease identification, anomaly detection) but does not conduct its own experiments or provide access information for any dataset it directly uses.
Dataset Splits No This paper is theoretical in nature, focusing on risk bounds and minimax risk for PU learning. It does not involve empirical experiments with specific datasets, and therefore, no information regarding dataset splits for training, validation, or testing is provided.
Hardware Specification No The paper presents a theoretical study of PU learning, including the establishment of risk bounds and lower bounds on minimax risk. As such, it does not describe any experimental setup that would require hardware specifications.
Software Dependencies No The paper focuses on theoretical analysis and proofs for PU learning, not on practical implementation or experimentation. Consequently, it does not list any software dependencies or specific version numbers required for replication.
Experiment Setup No As a theoretical paper, this work does not describe any empirical experiments or their setup, including hyperparameters, model initialization, or training schedules.