Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gibbs Sampling with People

Authors: Peter Harrison, Raja Marjieh, Federico Adolfi, Pol van Rijn, Manuel Anglada-Tort, Ofer Tchernichovski, Pauline Larrouy-Maestri, Nori Jacoby

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In an initial study, we show GSP clearly outperforming MCMCP; we then show that GSP provides novel and interpretable results in three other domains, namely musical chords, vocal emotions, and faces. We validate these results through large-scale perceptual rating experiments. The final experiments use GSP to navigate the latent space of a state-of-the-art image synthesis network (Style GAN), a promising approach for applying GSP to high-dimensional perceptual spaces. We conclude by discussing future cognitive applications and ethical implications. All combined, these 25 experiments represent data from 5,178 human participants.
Researcher Affiliation Academia Peter M. C. Harrison* Max Planck Institute for Empirical Aesthetics Frankfurt EMAIL Raja Marjieh* Max Planck Institute for Empirical Aesthetics Frankfurt EMAIL Federico Adolfi Max Planck Institute for Empirical Aesthetics Frankfurt EMAIL Pol van Rijn Max Planck Institute for Empirical Aesthetics Frankfurt EMAIL Manuel Anglada-Tort Max Planck Institute for Empirical Aesthetics Frankfurt EMAIL Ofer Tchernichovski Hunter College CUNY The CUNY Graduate Center EMAIL Pauline Larrouy-Maestri Max Planck Institute for Empirical Aesthetics Frankfurt EMAIL Nori Jacoby Max Planck Institute for Empirical Aesthetics Frankfurt EMAIL
Pseudocode No The paper describes the MCMC and Gibbs sampling algorithms conceptually but does not include structured pseudocode or algorithm blocks with specific labels or formatting.
Open Source Code Yes Appendices, code, and raw data are hosted at https://doi.org/10.17605/OSF.IO/RZK4S.
Open Datasets Yes Following [50], we apply this approach to the generative adversarial network Style GAN [51, 52], pretrained on the FFHQ dataset of faces from Flickr [51], and applying PCA to the intermediate latent code (termed w in the original papers). We began with three sentences from the Harvard sentence corpus [35] recorded by a female speaker [36], chosen to facilitate comparison with previous research; these sentences are phonologically balanced and semantically neutral.
Dataset Splits No The paper describes "validation experiments" involving new participant groups rating generated samples, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for the data used in the experiments.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper mentions software like Style GAN but does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the experiments.
Experiment Setup Yes Each method was evaluated using across-participant chains of length 30, with five chains per color category, with each chain s starting location sampled from a uniform distribution over the color space (Exp. 1a, 1b, 1c). We constructed 18 across-participant GSP chains of length 50 with uniformly sampled starting locations and three chains for each adjective (Fig. 4A, Exp. 4a). We used 293 US participants from AMT, aggregating 5 trials per iteration using the arithmetic mean.