reproducibilityindex.ai

Masked Completion via Structured Diffusion with White-Box Transformers

Authors: Druv Pai, Sam Buchanan, Ziyang Wu, Yaodong Yu, Yi Ma

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical evaluations confirm our analytical insights. CRATE-MAE demonstrates highly promising performance on large-scale imagery datasets while using only 30% of the parameters compared to the standard masked autoencoder with the same model configuration. In this section, we conduct experiments to evaluate CRATE-MAE on real-world datasets and both supervised and unsupervised tasks.
Researcher Affiliation	Academia	Druv Pai UC Berkeley Ziyang Wu UC Berkeley Sam Buchanan TTIC Yaodong Yu UC Berkeley Yi Ma UC Berkeley & HKU
Pseudocode	Yes	B.2 PYTORCH-LIKE PSEUDOCODE
Open Source Code	Yes	Code is available on Git Hub.
Open Datasets	Yes	We consider Image Net-1K (Deng et al., 2009) as the main experimental setting for our architecture. We fine-tune and linear probe our pre-trained CRATE-MAE on the following target datasets: CIFAR10/CIFAR100 (Krizhevsky et al., 2009), Oxford Flowers-102 (Nilsback & Zisserman, 2008), Oxford-IIIT-Pets (Parkhi et al., 2012).
Dataset Splits	No	The paper mentions the use of 'training and validation sets' (Table 4) and refers to 'standard practice' for MAE training, but it does not explicitly state the specific dataset split percentages or sample counts for training, validation, and test sets.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti) or CPU specifications used for running the experiments.
Software Dependencies	No	The paper mentions using Adam W optimizer and Scikit-Learn but does not provide specific version numbers for these or other software dependencies like PyTorch.
Experiment Setup	Yes	We configure the learning rate as 3 × 10−5, weight decay as 0.1, and batch size as 4,096. We configure the learning rate as 5 × 10−5, weight decay as 0.01, and batch size as 256.