Self-supervised Representation Learning from Random Data Projectors

Authors: Yi Sui, Tongzi Wu, Jesse C. Cresswell, Ga Wu, George Stein, Xiao Shi Huang, Xiaochen Zhang, Maksims Volkovs

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed approach on a wide range of representation learning tasks that span diverse modalities and real-world applications. We show that it outperforms multiple state-of-the-art SSRL baselines.
Researcher Affiliation Collaboration Yi Sui Layer 6 AI amy@layer6.ai Tongzi Wu Layer 6 AI tongzi@layer6.ai Jesse C. Cresswell Layer 6 AI jesse@layer6.ai Ga Wu Dalhousie University ga.wu@dal.ca George Stein Layer 6 AI george@layer6.ai Xiao Shi Huang Layer 6 AI gary@layer6.ai Xiaochen Zhang Layer 6 AI lisa@layer6.ai Maksims Volkovs Layer 6 AI maks@layer6.ai
Pseudocode Yes We summarize the LFR algorithm in Algorithm 1 which uses the subroutine in Algorithm 2. Algorithm 1 LFR: Learning From Randomness... Algorithm 2 Train-Network subroutine
Open Source Code Yes Towards the goal of reproducibility, we have provided our anonymized code repository as supplemantary material with this submission. The codebase includes instructions on how build the required environment, and how to run our proposed method as well as baseline methods.
Open Datasets Yes We utilized two standard time-series datasets, Human Activity Recognition (HAR) (Anguita et al., 2013) and Epileptic Seizure Recognition (Epilepsy) (Andrzejak et al., 2001). ... MIMIC-III Benchmark dataset (Harutyunyan et al., 2019)... three tabular UCI datasets in our experiments: Adult Income (Income) (Kohavi, 1996), First Order Theorem Proving (Theorem) (Bridge et al., 2014), and HEPMASS (Baldi et al., 2016). ... Kvasir (Pogorelov et al., 2017)... further results on CIFAR10 in Appendix D.6.
Dataset Splits No Table 3 provides 'Train Size' and 'Test Size' for each dataset (e.g., HAR: Train Size 7352, Test Size 2947; Kvasir: 6000 images for training and 2000 for testing). However, explicit separate validation set sizes or percentages are not consistently provided for all datasets in the main text.
Hardware Specification Yes The time series experiments with HAR and Epilepsy were conducted on a Tesla V100 GPU with 32 GB of memory... The MIMIC-III experiments were conducted with an NVIDIA A100 GPU with 40GB of memory... The Kvasir experiments were conducted using a Tesla V100 GPU with 32 GB of memory... The tabular dataset experiments with Income, Theorem, and HEPMASS were conducted on an NVIDIA TITAN V GPU with 12 GB of memory... The CIFAR experiments were conducted on a cluster with a single NVIDIA P100 GPU with 12 GB of memory per experiment...
Software Dependencies No The paper states, 'The codebase includes instructions on how build the required environment,' which implies software dependencies will be listed, but it does not explicitly list specific software components with version numbers within the paper's text.
Experiment Setup Yes Table 5: Details on LFR Training Settings lists Optimizer, Batch Size, Learning Rate, Optimizer Parameters, and Epochs for each dataset (e.g., HAR/Epilepsy: Adam, 128, 3e-4, β=(0.9, 0.999), wd=3e-4, Train epochs = 200, Predictor epochs = 5). Similar tables (Table 6 and 7) provide details for linear evaluation and supervised training settings.