Position: Compositional Generative Modeling: A Single Model is Not All You Need

Authors: Yilun Du, Leslie Pack Kaelbling

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Figure 5(a), given only a very limited number of agent episodes in an environment, a factorized model can more accurately simulate trajectory dynamics. In addition, we found that training a single joint generative model also took a substantially larger number of iterations to train than the factorized model as illustrated in Figure 5(b).
Researcher Affiliation Academia Yilun Du 1 Leslie Kaelbling 1 1MIT. Correspondence to: Yilun Du <yilundu@mit.edu>.
Pseudocode No The paper describes sampling procedures and mathematical equations (e.g., for Langevin dynamics) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code for the methodology described, nor does it include links to a code repository.
Open Datasets Yes on the MATH dataset (Hendrycks et al., 2021) ... Imagenet
Dataset Splits No The paper mentions using datasets like Maze2D, MATH dataset, and Imagenet, and discusses 'training data' and 'unseen start states' for evaluation. However, it does not specify explicit training, validation, or test split percentages or sample counts for any of the datasets used to reproduce experiments.
Hardware Specification No The paper mentions general trends in computational costs for large models ('current models already costing several hundred million dollars to train') but does not provide specific details about the hardware (e.g., GPU models, CPU specifications) used for its own experiments.
Software Dependencies No The paper does not provide specific software dependencies, libraries, or tool versions (e.g., Python 3.x, PyTorch x.x) that would be needed to replicate the experiments.
Experiment Setup No The paper describes aspects of its experimental process, such as using 'trajectory chunksize 8' for a compositional model or 'composing 5 instances of a GPT-3.5 model'. However, it lacks comprehensive, reproducible details for the experimental setup, such as specific hyperparameter values (e.g., learning rates, batch sizes), optimizers, or other system-level training configurations.