reproducibilityindex.ai

Listen, Think, and Understand

Authors: Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James R. Glass

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTS
Researcher Affiliation	Collaboration	MIT CSAIL1 MIT-IBM Watson AI Lab2
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code, dataset, and pretrained models are available at https://github.com/yuangongnd/ltu.
Open Datasets	Yes	we relabel existing public datasets including Audio Set (including a 500K subset of the original 2M weakly-labeled release (Gemmeke et al., 2017) and the 100K subset with temporally-strong labels (Hershey et al., 2021)), VGGSound (Chen et al., 2020a), FSD50K (Fonseca et al., 2021), Audio Caps (Kim et al., 2019), Freesound (Font et al., 2013), Clotho v2 (Lipping et al., 2019), and Sound Bible (soundbible.com, 2006) as our training data.
Dataset Splits	Yes	For all these datasets, we only include data marked as training and validation samples and exclude any data marked as test or evaluation. In the evaluation set, each of the 15 acoustic scenes has 108 segments and the total number of evaluation samples is 1,620.
Hardware Specification	Yes	The model is trained on 4 RTX A6000 GPUs for about 3 days.
Software Dependencies	No	The paper mentions software components like 'LLa MA-7B', 'Vicuna', 'GPT-3.5-Turbo', and 'GPT-4', but does not provide specific version numbers for these or other libraries used for reproducibility.
Experiment Setup	Yes	In all training stages, we use a batch size of 256 and linear learning rate decay with warmup. We set the text token cutoff length to 108. Throughout this paper, we use a plain generation setting of Temperature=0.1, Top K=500, and Top P=0.95 with a repetition penalty of 1.1 (Fan et al., 2018; Keskar et al., 2019).