reproducibilityindex.ai

Denoising Autoregressive Representation Learning

Authors: Yazhe Li, Jorg Bornschein, Ting Chen

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed approach (DARL) using both MSE and diffusion objectives. Our experiments explore the basic properties of the model and present ablations on training schedule and model scaling. We compare our results with other representation learning methods and show that DARL achieves performance close to state-of-the-art.
Researcher Affiliation	Industry	1Google Deep Mind, London, UK. 2x AI, San Francisco, US. Work done while at Google Deep Mind.
Pseudocode	No	The paper describes algorithms using text and mathematical formulas but does not provide any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include any statement about releasing source code or provide a link to a code repository.
Open Datasets	Yes	The ablations use Vi T-L with patch size 16 pre-trained on the Image Net dataset (Deng et al., 2009). We use the VTAB benchmark (Zhai et al., 2020) which consists 19 classiﬁcation tasks. Table 6. COCO Object Detection and Segmentation
Dataset Splits	Yes	We then select the best-performing hyperparameters based on validation results, retrain the model on the combined train+validation set, and report the ﬁnal test set performance. Figure 15. Samples from DARL. The ﬁrst image in each row (in the red rectangle) is the original image in Image Net validation set.
Hardware Specification	No	The paper does not specify any particular hardware used for running the experiments (e.g., GPU models, CPU types, or cloud computing instances with specifications).
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	The full list of hyper-parameters can be found in Table 8 in the Appendix. The initialization scheme and optimization follow the MAE recipe (He et al., 2021). Pre-training uses Adam W (Loshchilov & Hutter, 2019) for optimization with learning rate using cosine decay and 40 epochs warm-up.