reproducibilityindex.ai

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

Authors: Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin P. Murphy, Alexander Hauptmann, Lu Jiang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach is validated through in-context learning experiments with frozen Pa LM 2 and GPT 3.5 on a diverse set of image understanding and generation tasks.
Researcher Affiliation	Collaboration	Google, Carnegie Mellon University
Pseudocode	No	No structured pseudocode or algorithm blocks with explicit labels like 'Pseudocode' or 'Algorithm X' are found in the paper.
Open Source Code	No	The paper does not contain an explicit statement about releasing the source code or a direct link to a code repository for the described methodology.
Open Datasets	Yes	Following the prior work [27], SPAE is trained on the Image Net ILSVRC2012 [10] dataset.
Dataset Splits	Yes	We use FID [16], Inception Score (IS) [33], and LPIPS [48] to compare with the image VQGAN from Mask GIT [7] on the Image Net validation set, and FVD [36] to compare the 3D-VQGAN from MAGVIT [45] on the Kinetics-600 validation set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions using 'Adam [20] optimizer' and 'CLIP with a Vi T-L/14 [13] vision backbone' but does not specify version numbers for these or other software libraries or frameworks.
Experiment Setup	Yes	We train with a batch size of 256 for 450k steps. ... We use the Adam [20] optimizer with loss weights α = 1, β = 0.33, λ = 0.1, η = 0.1, φ = 10 4 and a learning rate of 10 4 following a linear warmup/cooldown and root square decay schedule.