reproducibilityindex.ai

Shape and Material from Sound

Authors: Zhoutong Zhang, Qiujia Li, Zhengjia Huang, Jiajun Wu, Josh Tenenbaum, Bill Freeman

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our models on a range of perception tasks: inferring object shape, material, and initial height from sound. We also collect human responses for each task and compare them with model estimates. Our results indicate that ﬁrst, humans are quite successful in these tasks; second, our model not only closely matches human successes, but also makes similar errors as humans do. For these quantitative evaluations, we have mostly used synthetic data, where ground truth labels are available. We further evaluate the model on recordings to demonstrate that it also performs well on real-world audios.
Researcher Affiliation	Collaboration	Zhoutong Zhang MIT Qiujia Li University of Cambridge Zhengjia Huang Shanghai Tech University Jiajun Wu MIT Joshua B. Tenenbaum MIT William T. Freeman MIT, Google Research
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using Bullet, an open-source physics engine, but does not provide concrete access to the source code for the methodology described in this paper.
Open Datasets	No	Because real audio recordings with rich labels are hard to acquire, we synthesize random audio clips using our physics-based simulation to evaluate our models. Speciﬁcally, we focus on a single scenario shape primitives falling onto the ground. We ﬁrst construct an audio dataset that includes 14 primitives (some shown in Table 2), each with 10 different speciﬁc moduli (deﬁned as Young s modulus over density).
Dataset Splits	No	The paper mentions that the fully supervised model is trained on "200,000 audios", and "52 test cases" for human studies, but does not provide specific dataset split information (exact percentages, sample counts for train/validation/test, or detailed splitting methodology) needed to reproduce the data partitioning for its machine learning models.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions implementing the framework in Torch7.
Software Dependencies	No	The paper mentions using "Bullet" (physics engine) and "Torch7" (framework), but does not provide specific version numbers for these software components, which are required for a reproducible description of ancillary software.
Experiment Setup	Yes	We use a time step of 1/300 second to ensure simulation accuracy. We perform 80 sweeps of MCMC sampling over all the 7 latent variables; for every sweep, each variable is sampled twice. The spectrogram of the signal using a Tukey window of length 5,000 with a 2,000 sample overlap. For each window, a 10,000 point Fourier transform is applied. We used stochastic gradient descent for training, with a learning rate of 0.001, a momentum of 0.9 and a batch size of 16. Mean Square Error(MSE) loss is used for back-propagation.