Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Self-supervised Representation Learning from Random Data Projectors
Authors: Yi Sui, Tongzi Wu, Jesse C. Cresswell, Ga Wu, George Stein, Xiao Shi Huang, Xiaochen Zhang, Maksims Volkovs
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed approach on a wide range of representation learning tasks that span diverse modalities and real-world applications. We show that it outperforms multiple state-of-the-art SSRL baselines. |
| Researcher Affiliation | Collaboration | Yi Sui Layer 6 AI EMAIL Tongzi Wu Layer 6 AI EMAIL Jesse C. Cresswell Layer 6 AI EMAIL Ga Wu Dalhousie University EMAIL George Stein Layer 6 AI EMAIL Xiao Shi Huang Layer 6 AI EMAIL Xiaochen Zhang Layer 6 AI EMAIL Maksims Volkovs Layer 6 AI EMAIL |
| Pseudocode | Yes | We summarize the LFR algorithm in Algorithm 1 which uses the subroutine in Algorithm 2. Algorithm 1 LFR: Learning From Randomness... Algorithm 2 Train-Network subroutine |
| Open Source Code | Yes | Towards the goal of reproducibility, we have provided our anonymized code repository as supplemantary material with this submission. The codebase includes instructions on how build the required environment, and how to run our proposed method as well as baseline methods. |
| Open Datasets | Yes | We utilized two standard time-series datasets, Human Activity Recognition (HAR) (Anguita et al., 2013) and Epileptic Seizure Recognition (Epilepsy) (Andrzejak et al., 2001). ... MIMIC-III Benchmark dataset (Harutyunyan et al., 2019)... three tabular UCI datasets in our experiments: Adult Income (Income) (Kohavi, 1996), First Order Theorem Proving (Theorem) (Bridge et al., 2014), and HEPMASS (Baldi et al., 2016). ... Kvasir (Pogorelov et al., 2017)... further results on CIFAR10 in Appendix D.6. |
| Dataset Splits | No | Table 3 provides 'Train Size' and 'Test Size' for each dataset (e.g., HAR: Train Size 7352, Test Size 2947; Kvasir: 6000 images for training and 2000 for testing). However, explicit separate validation set sizes or percentages are not consistently provided for all datasets in the main text. |
| Hardware Specification | Yes | The time series experiments with HAR and Epilepsy were conducted on a Tesla V100 GPU with 32 GB of memory... The MIMIC-III experiments were conducted with an NVIDIA A100 GPU with 40GB of memory... The Kvasir experiments were conducted using a Tesla V100 GPU with 32 GB of memory... The tabular dataset experiments with Income, Theorem, and HEPMASS were conducted on an NVIDIA TITAN V GPU with 12 GB of memory... The CIFAR experiments were conducted on a cluster with a single NVIDIA P100 GPU with 12 GB of memory per experiment... |
| Software Dependencies | No | The paper states, 'The codebase includes instructions on how build the required environment,' which implies software dependencies will be listed, but it does not explicitly list specific software components with version numbers within the paper's text. |
| Experiment Setup | Yes | Table 5: Details on LFR Training Settings lists Optimizer, Batch Size, Learning Rate, Optimizer Parameters, and Epochs for each dataset (e.g., HAR/Epilepsy: Adam, 128, 3e-4, β=(0.9, 0.999), wd=3e-4, Train epochs = 200, Predictor epochs = 5). Similar tables (Table 6 and 7) provide details for linear evaluation and supervised training settings. |