Denoising Autoregressive Representation Learning
Authors: Yazhe Li, Jorg Bornschein, Ting Chen
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed approach (DARL) using both MSE and diffusion objectives. Our experiments explore the basic properties of the model and present ablations on training schedule and model scaling. We compare our results with other representation learning methods and show that DARL achieves performance close to state-of-the-art. |
| Researcher Affiliation | Industry | 1Google Deep Mind, London, UK. 2x AI, San Francisco, US. Work done while at Google Deep Mind. |
| Pseudocode | No | The paper describes algorithms using text and mathematical formulas but does not provide any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include any statement about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | The ablations use Vi T-L with patch size 16 pre-trained on the Image Net dataset (Deng et al., 2009). We use the VTAB benchmark (Zhai et al., 2020) which consists 19 classification tasks. Table 6. COCO Object Detection and Segmentation |
| Dataset Splits | Yes | We then select the best-performing hyperparameters based on validation results, retrain the model on the combined train+validation set, and report the final test set performance. Figure 15. Samples from DARL. The first image in each row (in the red rectangle) is the original image in Image Net validation set. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running the experiments (e.g., GPU models, CPU types, or cloud computing instances with specifications). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | The full list of hyper-parameters can be found in Table 8 in the Appendix. The initialization scheme and optimization follow the MAE recipe (He et al., 2021). Pre-training uses Adam W (Loshchilov & Hutter, 2019) for optimization with learning rate using cosine decay and 40 epochs warm-up. |