Dictionary Contrastive Learning for Efficient Local Supervision without Auxiliary Networks

Authors: Suhwan Choi, Myeongho Jeon, Yeonjung Hwang, Jeonglyul Oh, Sungjun Lim, Joonseok Lee, Myungjoo Kang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS 5.1 EXPERIMENTAL SETUPS 5.2 MAIN RESULTS Table 1: Test errors and the number of parameters with convolutional networks. Table 3: Test errors across different datasets using the VGG8B architecture employed by Nøkland & Eidnes (2019).
Researcher Affiliation Collaboration Suhwan Choi1,2 Myeongho Jeon1 Yeonjung Hwang1 Jeonglyul Oh1 Sungjun Lim1 Joonseok Lee1,3 Myungjoo Kang1 1Seoul National University, 2CRABs.ai, 3Google Research
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code No The paper does not provide a specific repository link, explicit code release statement, or indicate that code is in supplementary materials for the methodology described in this paper.
Open Datasets Yes For the Conv and FC architectures, we test our method on MNIST (Le Cun, 1998), CIFAR-10, and CIFAR-100 (Krizhevsky et al., 2009) datasets... employing the VGG8B (Simonyan & Zisserman, 2015; Nøkland & Eidnes, 2019) architecture, our method was evaluated on datasets including MNIST, Fashion MNIST (Xiao et al., 2017), CIFAR-10, CIFAR-100, SVHN (Netzer et al., 2011), and STL-10 (Coates et al., 2011) datasets.
Dataset Splits No The MNIST dataset consists of 60000 training and 10000 test samples, with 10 label classes. Each sample is a 28 28 grayscale image. CIFAR-10 and CIFAR-100 provide 50000 training and 10000 testing RGB images of size 32 32... The paper specifies training and test sets but does not provide explicit details about a separate validation set or how data was split for validation purposes.
Hardware Specification No The paper mentions the use of 'GPU resources' and 'GPU memory consumption' but does not specify the exact GPU models (e.g., NVIDIA A100, V100), CPU models, or other detailed hardware specifications used for running the experiments.
Software Dependencies No The paper states 'Our experiments were conducted using Pytorch' but does not provide specific version numbers for Pytorch or any other software dependencies.
Experiment Setup Yes Across all architectures, we use the Adam W optimizer (Loshchilov & Hutter, 2018) with the default Pytorch setting: β1 = 0.9, β2 = 0.999, and weight_deacy = 0.01. Table VII details the training hyperparameters for every architecture... A batch size of 128 is used for all experiments.