Self-supervised Representation Learning from Random Data Projectors
Authors: Yi Sui, Tongzi Wu, Jesse C. Cresswell, Ga Wu, George Stein, Xiao Shi Huang, Xiaochen Zhang, Maksims Volkovs
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed approach on a wide range of representation learning tasks that span diverse modalities and real-world applications. We show that it outperforms multiple state-of-the-art SSRL baselines. |
| Researcher Affiliation | Collaboration | Yi Sui Layer 6 AI amy@layer6.ai Tongzi Wu Layer 6 AI tongzi@layer6.ai Jesse C. Cresswell Layer 6 AI jesse@layer6.ai Ga Wu Dalhousie University ga.wu@dal.ca George Stein Layer 6 AI george@layer6.ai Xiao Shi Huang Layer 6 AI gary@layer6.ai Xiaochen Zhang Layer 6 AI lisa@layer6.ai Maksims Volkovs Layer 6 AI maks@layer6.ai |
| Pseudocode | Yes | We summarize the LFR algorithm in Algorithm 1 which uses the subroutine in Algorithm 2. Algorithm 1 LFR: Learning From Randomness... Algorithm 2 Train-Network subroutine |
| Open Source Code | Yes | Towards the goal of reproducibility, we have provided our anonymized code repository as supplemantary material with this submission. The codebase includes instructions on how build the required environment, and how to run our proposed method as well as baseline methods. |
| Open Datasets | Yes | We utilized two standard time-series datasets, Human Activity Recognition (HAR) (Anguita et al., 2013) and Epileptic Seizure Recognition (Epilepsy) (Andrzejak et al., 2001). ... MIMIC-III Benchmark dataset (Harutyunyan et al., 2019)... three tabular UCI datasets in our experiments: Adult Income (Income) (Kohavi, 1996), First Order Theorem Proving (Theorem) (Bridge et al., 2014), and HEPMASS (Baldi et al., 2016). ... Kvasir (Pogorelov et al., 2017)... further results on CIFAR10 in Appendix D.6. |
| Dataset Splits | No | Table 3 provides 'Train Size' and 'Test Size' for each dataset (e.g., HAR: Train Size 7352, Test Size 2947; Kvasir: 6000 images for training and 2000 for testing). However, explicit separate validation set sizes or percentages are not consistently provided for all datasets in the main text. |
| Hardware Specification | Yes | The time series experiments with HAR and Epilepsy were conducted on a Tesla V100 GPU with 32 GB of memory... The MIMIC-III experiments were conducted with an NVIDIA A100 GPU with 40GB of memory... The Kvasir experiments were conducted using a Tesla V100 GPU with 32 GB of memory... The tabular dataset experiments with Income, Theorem, and HEPMASS were conducted on an NVIDIA TITAN V GPU with 12 GB of memory... The CIFAR experiments were conducted on a cluster with a single NVIDIA P100 GPU with 12 GB of memory per experiment... |
| Software Dependencies | No | The paper states, 'The codebase includes instructions on how build the required environment,' which implies software dependencies will be listed, but it does not explicitly list specific software components with version numbers within the paper's text. |
| Experiment Setup | Yes | Table 5: Details on LFR Training Settings lists Optimizer, Batch Size, Learning Rate, Optimizer Parameters, and Epochs for each dataset (e.g., HAR/Epilepsy: Adam, 128, 3e-4, β=(0.9, 0.999), wd=3e-4, Train epochs = 200, Predictor epochs = 5). Similar tables (Table 6 and 7) provide details for linear evaluation and supervised training settings. |