Composing Ensembles of Pre-trained Models via Iterative Consensus
Authors: Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Igor Mordatch
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed framework for composing pre-trained models on four representative tasks, including image generation, video question answering, grade school math, and robot manipulation. We compare the proposed method with baselines on the above four zero-shot tasks. |
| Researcher Affiliation | Collaboration | MIT CSAIL lishuang@mit.edu Yilun Du MIT CSAIL yilundu@mit.edu Joshua B. Tenenbaum MIT CSAIL, BCS, CBMM jbt@mit.edu Antonio Torralba MIT CSAIL torralba@mit.edu Igor Mordatch Google Brain imordatch@google.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks (i.e., sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'). |
| Open Source Code | No | The paper does not provide a direct statement or link for the open-source code of the described methodology. It only links to third-party pre-trained models used. |
| Open Datasets | Yes | We evaluate the image generation results on Image Net (Deng et al., 2009)... We evaluate methods for solving VQA tasks on Activity Net-QA (Yu et al., 2019). GSM8K (Cobbe et al., 2021) is a dataset for grade school math problems... We next evaluate how pre-trained models may be used to manipulate objects in Ravens (Zeng et al., 2020). |
| Dataset Splits | No | The paper mentions datasets used for evaluation and a test set for GSM8K, but it does not specify explicit train/validation/test splits (e.g., percentages or counts) for all datasets needed to reproduce the data partitioning, nor does it cite specific predefined splits for all datasets. |
| Hardware Specification | Yes | We use TITAN RTX 24GB GPUs for all the experiments. |
| Software Dependencies | No | The paper mentions using the 'Huggingface library (Wolf et al., 2019)' for CLIP models and provides URLs for specific CLIP model checkpoints. However, it does not provide specific version numbers for the Huggingface library itself or other key software components, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | The guidance scale is set to 3. (for image generation) ... In our experiments, we use 5 steps of gradient descent. The learning rate α is set to 0.3. (for VQA and grade school math) |