Masked Completion via Structured Diffusion with White-Box Transformers
Authors: Druv Pai, Sam Buchanan, Ziyang Wu, Yaodong Yu, Yi Ma
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical evaluations confirm our analytical insights. CRATE-MAE demonstrates highly promising performance on large-scale imagery datasets while using only 30% of the parameters compared to the standard masked autoencoder with the same model configuration. In this section, we conduct experiments to evaluate CRATE-MAE on real-world datasets and both supervised and unsupervised tasks. |
| Researcher Affiliation | Academia | Druv Pai UC Berkeley Ziyang Wu UC Berkeley Sam Buchanan TTIC Yaodong Yu UC Berkeley Yi Ma UC Berkeley & HKU |
| Pseudocode | Yes | B.2 PYTORCH-LIKE PSEUDOCODE |
| Open Source Code | Yes | Code is available on Git Hub. |
| Open Datasets | Yes | We consider Image Net-1K (Deng et al., 2009) as the main experimental setting for our architecture. We fine-tune and linear probe our pre-trained CRATE-MAE on the following target datasets: CIFAR10/CIFAR100 (Krizhevsky et al., 2009), Oxford Flowers-102 (Nilsback & Zisserman, 2008), Oxford-IIIT-Pets (Parkhi et al., 2012). |
| Dataset Splits | No | The paper mentions the use of 'training and validation sets' (Table 4) and refers to 'standard practice' for MAE training, but it does not explicitly state the specific dataset split percentages or sample counts for training, validation, and test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti) or CPU specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using Adam W optimizer and Scikit-Learn but does not provide specific version numbers for these or other software dependencies like PyTorch. |
| Experiment Setup | Yes | We configure the learning rate as 3 × 10−5, weight decay as 0.1, and batch size as 4,096. We configure the learning rate as 5 × 10−5, weight decay as 0.01, and batch size as 256. |