PUe: Biased Positive-Unlabeled Learning Enhancement by Causal Inference

Authors: Xutao Wang, Hanting Chen, Tianyu Guo, Yunhe Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on three benchmark datasets demonstrate the proposed PUe algorithm significantly improves the accuracy of classifiers on non-uniform label distribution datasets compared to advanced cost-sensitive PU methods. Codes are available at https://github.com/huawei-noah/Noah-research/ tree/master/PUe and https://gitee.com/mindspore/models/ tree/master/research/cv/PUe.
Researcher Affiliation Industry Xutao Wang, Hanting Chen, Tianyu Guo, Yunhe Wang Huawei Noah s Ark Lab. {xutao.wang,chenhanting,tianyu.guo,yunhe.wang}@huawei.com,
Pseudocode Yes Algorithm 1 PUe algorithm
Open Source Code Yes Codes are available at https://github.com/huawei-noah/Noah-research/ tree/master/PUe and https://gitee.com/mindspore/models/ tree/master/research/cv/PUe.
Open Datasets Yes We conducted experiments on two benchmarks commonly used in PU learning: MNIST for parity classification and CIFAR-10 [17] for vehicle class recognition. And on the simulated datasets of MNIST and CIFAR-10, we know the propensity score of the sample a priori, and we compare our proposed method with the ideal propensity to know the propensity score.Moreover, we tested our method on the Alzheimer s dataset 2 used to identify Alzheimer s disease in order to test the performance of our proposed method in real-world scenarios.
Dataset Splits No The paper mentions training, testing, and evaluation metrics but does not explicitly detail validation splits. It does specify 'warm-up phase of 60 epochs, and then trains another 60 epochs with depth' which implies some form of training/validation cycle, but no specific validation set details.
Hardware Specification No The paper mentions "CANN (Compute Architecture for Neural Networks) and Ascend AI Processor" but without specific model numbers or detailed specifications of the processors, it's not a reproducible hardware description.
Software Dependencies No The paper mentions "All the experiments are run by Py Torch" and acknowledges "Mind Spore [20]", but does not provide specific version numbers for these software components.
Experiment Setup Yes The training batch size is set as 256 for MNIST and CIFAR10, while 128 for Alzheimer. We use Adam as the optimizer with a cosine annealing scheduler, where the initial learning rate is set as 5 10 3; while weight decay is set as 5 10 3. PU learning methods first experiences a warm-up phase of 60 epochs, and then trains another 60 epochs with depth, where the value of α is searched in the range of [0, 20].