Learning Weakly-supervised Contrastive Representations

Authors: Yao-Hung Hubert Tsai, Tianqin Li, Weixin Liu, Peiyuan Liao, Ruslan Salakhutdinov, Louis-Philippe Morency

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical experiments suggest the following three contributions. First, compared to conventional self-supervised representations, the auxiliary-information-infused representations bring the performance closer to the supervised representations, which use direct downstream labels as supervision signals. Second, our approach performs the best in most cases, when comparing our approach with other baseline representation learning methods that also leverage auxiliary data information. Third, we show that our approach also works well with unsupervised constructed clusters (e.g., no auxiliary information), resulting in a strong unsupervised representation learning approach.
Researcher Affiliation Academia Yao-Hung Hubert Tsai1 , Tianqin Li1 , Weixin Liu1, Peiyuan Liao1, Ruslan Salakhutdinov1, Louis-Philippe Morency1 1Carnegie Mellon University {yaohungt, tianqinl, weixinli, peiyuanl, rsalakhu, morency}@cs.cmu.edu
Pseudocode Yes Algorithm 1: K-means Clusters + Cl-Info NCE Result: Pretrained Encoder fθ( ) fθ( ) Base Encoder Network; Aug ( ) Obtaining Two Variants of Augmented Data via Augmentation Functions; Embedding Gathering data representations by passing data through fθ( ); Clusters K-means-clustering(Embedding); for epoch in 1,2,...,N do for batch in 1,2,...,M do data1, data2 Aug(data_batch); feature1, feature2 fθ(data1), fθ(data2); LCl-info NCE Cl-Info NCE(feature1, feature2, Clusters); fθ fθ lr θLCl-info NCE; end Embedding gather embeddings for all data through fθ( ); Clusters K-means-clustering(Embedding); end
Open Source Code Yes Code available at: https://github.com/Crazy-Jack/Cl-Info NCE.
Open Datasets Yes We conduct experiments on learning visual representations using UT-zappos50K (Yu & Grauman, 2014), CUB-200-2011 (Wah et al., 2011), Wider Attribute (Li et al., 2016) and Image Net-100 (Russakovsky et al., 2015) datasets.
Dataset Splits Yes We randomly split train-validation images by 7 : 3 ratio, resulting in 35, 017 train data and 15, 008 validation dataset. (UT-Zappos50K) The dataset comes with its training, validation, and test split. Due to a small number of data, we combine the original training and validation set as our training set and use the original test set as our validation set. The resulting training set contains 6, 871 images and the validation set contains 6, 918 images. (Wider Attribute) We follow the original train-validation split, resulting in 5, 994 train images and 5, 794 validation images. (CUB-200-2011) The training split contains 128, 783 images and the test split contains 5, 000 images. (Image Net-100)
Hardware Specification Yes We conduct experiments on machines with 4 NVIDIA Tesla P100.
Software Dependencies No The paper mentions using ResNet-50 as the feature encoder but does not specify software dependencies like Python, PyTorch, or CUDA versions.
Experiment Setup Yes We choose SGD with momentum of 0.95 for optimizer with a weight decay of 0.0001 to prevent network over-fitting. To allow stable training, we employ a linear warm-up and cosine decay scheduler for learning rate. For experiments shown in Figure 4 (a) in the main text, the learning rate is set to be 0.17 and the temperature is chosen to be 0.07 in Cl-Info NCE. And for experiments shown in Figure 5 in the main text, learning rate is set to be 0.1 and the temperature is chosen to be 0.1 in Cl-Info NCE.