Rethinking InfoNCE: How Many Negative Samples Do You Need?

Authors: Chuhan Wu, Fangzhao Wu, Yongfeng Huang

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments in three different tasks show our framework can accurately predict the optimal negative sampling ratio, and various models can benefit from our adaptive negative sampling method.
Researcher Affiliation Collaboration 1Department of Electronic Engineering, Tsinghua University, Beijing 100084, China 2Microsoft Research Asia, Beijing 100080, China
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets Yes The first two tasks are performed on the MIND [Wu et al., 2020] dataset6, which contains the news impression logs of 1 million users in 6 weeks. (footnote 6: https://msnews.github.io/) The item recommendation task is performed on the Movie Lens dataset [Harper and Konstan, 2015], and we use the ML-1M 8 version for experiments. (footnote 8: https://grouplens.org/datasets/movielens/1m/)
Dataset Splits Yes For the news recommendation task...The logs in the last week are used for test, and the rest are used for training and validation. For the news title-body matching task, We use the news in the training set of MIND for model training, and those in the validation and test sets (except the news included in the training set) for validation and test respectively. For the Movie Lens dataset, We use the same experimental settings as [Sun et al., 2019].
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions 'We use Adam [Kingma and Ba, 2015] as the optimizer' but does not specify version numbers for Adam or any other software dependencies, programming languages, or libraries used for implementation.
Experiment Setup Yes We use Adam [Kingma and Ba, 2015] as the optimizer and the learning rate is 1e-4. The batch size is 32. The hidden dimension is 256 in the news recommendation and title-body matching tasks, and 64 in the item recommendation task.