reproducibilityindex.ai

Weighted Ensemble Self-Supervised Learning

Authors: Yangjun Ruan, Saurabh Singh, Warren Richard Morningstar, Alexander A Alemi, Sergey Ioffe, Ian Fischer, Joshua V. Dillon

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The effectiveness of our method is demonstrated with two state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al., 2022). Our method outperforms both in multiple evaluation metrics on Image Net-1K, particularly in the few-shot setting. Thorough experiments yield improved prior art baselines which our method still surpasses
Researcher Affiliation	Collaboration	Yangjun Ruan Saurabh Singh Warren Morningstar Alexander A. Alemi Sergey Ioffe Ian Fischer Joshua V. Dillon Google Research University of Toronto & Vector Institute. Work done as a student researcher at Google.
Pseudocode	Yes	See Appx. A for pseudocode. Algorithm 1: Pseudocode for computing ensemble loss Algorithm 2: Pseudocode for ensemble heads with simpliﬁed DINO
Open Source Code	No	The paper mentions 'DINO’s publicly-available pretrained weights' and references 'the official DINO implementation' and 'public MSN code' (with URLs in footnotes). However, it does not explicitly state that the authors have released the source code for their own proposed methodology or provide a link to it.
Open Datasets	Yes	We experimented with DINO (Caron et al., 2021) and MSN (Assran et al., 2022) models on Image Net ILSVRC-2012 dataset (Deng et al., 2009). ... We used the 1-/2-/5-shot Image Net dataset splits3 in Assran et al. (2022) and 1% ( 13-shot) Image Net dataset splits4.
Dataset Splits	Yes	For 1-/2-/5-shots evaluation results, we report the mean accuracy and standard deviation across 3 random splits of the data following Assran et al. (2022). ... For all few-shot evaluations, we searched the L2 regularization strength over {1e-4, 3e-4, 1e-3, 3e-3, 1e-2, 3e-2, 1e-1, 3e-1, 1, 3, 10}.
Hardware Specification	Yes	We benchmarked the wall-clock time and peak memory on 128 TPUv3 cores.
Software Dependencies	No	The paper mentions using 'JAX', 'AdamW optimizer', 'scikit-learn package', and 'tensorflow-datasets (tfd)' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	In particular, all models were trained with Adam W optimizer (Loshchilov & Hutter, 2018) and a batch size of 1024. The learning rate was linearly warmuped to 0.002 (=0.001 batch size/512) and followed a cosine decay schedule. The weight decay followed a cosine schedule from 0.04 to 0.4. The momentum rate for the teacher was increased from 0.996 to 1 with a cosine schedule following BYOL (Grill et al., 2020). A stochastic depth (Huang et al., 2016) of 0.1 was applied without dropout (Srivastava et al., 2014). The student temperature τ is set to 0.1. ... We used a 3-layer projection head with a hidden dimension of 1024. ... Tables 7 and 8 provide detailed hyper-parameters for training.