Energy-Based Contrastive Learning of Visual Representations

Authors: Beomsu Kim, Jong Chul Ye

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on a variety of small and medium-scale datasets demonstrate that EBCLR is robust to small numbers of negative pairs, and it outperforms Sim CLR and Mo Co v2 [12] in terms of sample efficiency and linear evaluation accuracy.
Researcher Affiliation Academia Beomsu Kim Department of Mathematical Sciences KAIST beomsu.kim@kaist.ac.kr Jong Chul Ye Kim Jaechul Graduate School of AI KAIST jong.ye@kaist.ac.kr
Pseudocode Yes The pseudocodes for MSGLD and EBCLR are given in Algorithms 1 and 2, respectively, in Appendix B
Open Source Code Yes Code: https://github.com/1202kbs/EBCLR
Open Datasets Yes We use four datasets: MNIST [23], Fashion MNIST (FMNIST) [24], CIFAR10, and CIFAR100 [25].
Dataset Splits No The paper states that 'A complete description is deferred to Appendix D.' regarding experimental settings and confirms in the checklist that 'training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes]' were specified. However, the provided text does not contain explicit details on validation dataset splits.
Hardware Specification No The paper states in Section 3.d that the 'total amount of compute and the type of resources used' are included, and answered '[Yes]', but these details are likely deferred to Appendix D, which is not provided. The visible text does not contain any specific hardware details such as GPU/CPU models.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies. While it mentions Python and other tools implicitly through common machine learning practices, it lacks explicit versioning.
Experiment Setup Yes We use batch size 128 for EBCLR and batch size 256 for the baseline methods following Wang et. al [27] and train each method for 100 epochs. In our experiments, we set ϕθ to be a Res Net-18 [26] up to the global average pooling layer and πθ to be a 2-layer MLP with output dimension 128. However, we remove batch normalization because batch normalization hurts SGLD [16]. We also replace Re LU with leaky Re LU to expedite the convergence of SGLD. We explored the effect of changing the hyperparameter λ which controls the importance of the generative term relative to the discriminative term (see Equation (11)).