Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Emergent Graphical Conventions in a Visual Communication Game

Authors: Shuwen Qiu, Sirui Xie, Lifeng Fan, Tao Gao, Jungseock Joo, Song-Chun Zhu, Yixin Zhu

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results under different controls are consistent with the observation in studies of human graphical conventions (Hawkins et al., 2019; Fay et al., 2010). Critically, the empirical results assessed on our metrics align well with our prediction based on the findings of human graphical conventions (Fay et al., 2010; Hawkins et al., 2019), justifying our environment, model, and evaluation.
Researcher Affiliation	Academia	Shuwen Qiu ,1, Sirui Xie ,1, Lifeng Fan2, Tao Gao3,4, Jungseock Joo3, Song-Chun Zhu1,2,4,5, Yixin Zhu5 1 Department of Computer Science, UCLA 2 Beijing Institute for General Artificial Intelligence (BIGAI) 3 Department of Communication, UCLA 4 Department of Statistics, UCLA 5 Institute for Artificial Intelligence, Peking University
Pseudocode	Yes	Algorithm 1: Training Algorithm Initialization :Initialize neural network parameters θ, ρ, ϕ for πS, πR, and vϕ, respectively. 1 for game round l = 1, ..., L do
Open Source Code	No	The paper provides a link to a project website 'https://sites.google.com/view/emergent-graphical-conventions' and mentions a 'supplementary video', but it does not explicitly state that the source code for the described methodology is publicly available or provide a direct link to a code repository.
Open Datasets	Yes	Images We used the Sketchy dataset (Sangkloy et al., 2016) as the image source.
Dataset Splits	Yes	We use 30 categories (10 images per category) for training and held out 10 images per category for the unseen-instance test; another 10 categories are for the unseen-class test. Between iterations, we randomly sample another 10 batches for validation.
Hardware Specification	Yes	We train each model on a single Nvidia RTX A6000; one experiment takes 20 hours.
Software Dependencies	No	The paper mentions various components like 'pre-trained VGG16' and 'Adam' optimizer but does not provide specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	We train the sender/receiver with batched forward and back-propagation, with a batch size of 64 and maximum roll-out step T = 7. We update using Adam (Kingma and Ba, 2015) with the learning rate 0.0001 for a total of 30k iterations. In all settings, we set M = 3, γ = 0.85.