Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Masking: A New Perspective of Noisy Supervision

Authors: Bo Han, Jiangchao Yao, Gang Niu, Mingyuan Zhou, Ivor Tsang, Ya Zhang, Masashi Sugiyama

NeurIPS 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on CIFAR-10 and CIFAR-100 with three noise structures as well as the industrial-level Clothing1M with agnostic noise structure, and the results show that Masking can improve the robustness of classiﬁers signiﬁcantly.
Researcher Affiliation	Academia	1Centre for Artiﬁcial Intelligence, University of Technology Sydney 2Center for Advanced Intelligence Project, RIKEN 3Cooperative Medianet Innovation Center, Shanghai Jiao Tong University 4Mc Combs School of Business, The University of Texas at Austin 5Graduate School of Frontier Sciences, University of Tokyo
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The implementation is available at https://github.com/bhan ML/Masking.
Open Datasets	Yes	CIFAR-10 and CIFAR-100 datasets are used. Both datasets consist of 50k samples for training and 10k samples for testing, where each sample is a 32 32 color image and its label. For CIFAR-10, we randomly ﬂip the labels of the training set according to the ﬁrst two types of noise structure... An industrial-level dataset called Clothing1M [46] from online shopping websites (i.e., Taobao.com) is used here, where the ground-truth transition matrix is not available.
Dataset Splits	No	The paper specifies training and testing splits, but does not provide explicit details about a separate validation dataset split used in their experiments. It mentions the concept of validation sets in a discussion but not as part of their concrete experimental setup for data partitioning.
Hardware Specification	Yes	All experiments are conducted on a NVIDIA TITAN GPU... We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
Software Dependencies	No	The paper states that methods are "implemented by Tensorﬂow" but does not provide specific version numbers for TensorFlow or any other software dependencies.
Experiment Setup	Yes	For both datasets, the batch size is set to 128 for 15,000 iterations. α and β in Eq. (2) are respectively set 0.05 and 0.005.