Self-supervised Learning from a Multi-view Perspective

Authors: Yao-Hung Hubert Tsai, Yue Wu, Ruslan Salakhutdinov, Louis-Philippe Morency

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct controlled experiments to evaluate the impact of the composite objectives. We also explore our framework s empirical generalization beyond the multi-view perspective, where the cross-view redundancy may not be clearly observed.
Researcher Affiliation Academia Yao-Hung Hubert Tsai, Yue Wu, Ruslan Salakhutdinov, Louis-Philippe Morency Machine Learning Department, Carnegie Mellon University
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes To reproduce the results in our experimental section, please refer to our released code8. 8https://github.com/yaohungt/Self_Supervised_Learning_Multiview
Open Datasets Yes We use Omniglot dataset (Lake et al., 2015) 7 in this experiment. The training set contains images from 964 characters, and the test set contains 659 characters. There are no characters overlap between the training and test set. Each character contains twenty examples drawn from twenty different people.
Dataset Splits Yes The training set contains images from 964 characters, and the test set contains 659 characters. ... Specifically, a linear classifier is trained from the self-supervised learned (fixed) representation to the labels on the training set. Commonly used metrics for multi-label classification are reported on MS COCO validation set: Micro ROC-AUC and Subset Accuracy.
Hardware Specification No The paper acknowledges 'NVIDIA s GPU support' but does not specify the model or detailed hardware specifications used for its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiments.
Experiment Setup Yes The input image has size 105 105. For image augmentations, we adopt 1) rotation with degrees from 10 to +10 ; 2) translation from 15 pixels to +15 pixels; 3) scaling both width and height from 0.85 to 1.0; and 4) scaling width from 0.85 to 1.25 while fixing the height; and 5) resizing the image to 28 28. Then, a deep network takes a 28 28 image and outputs a 1024 dim. feature vector. ... LSSL = λCL LCL + λF P LF P + λIP LIP , (4) where λCL, λF P , and λIP are hyper-parameters.