Denoising Autoregressive Representation Learning

Authors: Yazhe Li, Jorg Bornschein, Ting Chen

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed approach (DARL) using both MSE and diffusion objectives. Our experiments explore the basic properties of the model and present ablations on training schedule and model scaling. We compare our results with other representation learning methods and show that DARL achieves performance close to state-of-the-art.
Researcher Affiliation Industry 1Google Deep Mind, London, UK. 2x AI, San Francisco, US. Work done while at Google Deep Mind.
Pseudocode No The paper describes algorithms using text and mathematical formulas but does not provide any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not include any statement about releasing source code or provide a link to a code repository.
Open Datasets Yes The ablations use Vi T-L with patch size 16 pre-trained on the Image Net dataset (Deng et al., 2009). We use the VTAB benchmark (Zhai et al., 2020) which consists 19 classification tasks. Table 6. COCO Object Detection and Segmentation
Dataset Splits Yes We then select the best-performing hyperparameters based on validation results, retrain the model on the combined train+validation set, and report the final test set performance. Figure 15. Samples from DARL. The first image in each row (in the red rectangle) is the original image in Image Net validation set.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments (e.g., GPU models, CPU types, or cloud computing instances with specifications).
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch, TensorFlow, etc.).
Experiment Setup Yes The full list of hyper-parameters can be found in Table 8 in the Appendix. The initialization scheme and optimization follow the MAE recipe (He et al., 2021). Pre-training uses Adam W (Loshchilov & Hutter, 2019) for optimization with learning rate using cosine decay and 40 epochs warm-up.