Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Pragmatic Image Compression for Human-in-the-Loop Decision-Making

Authors: Sid Reddy, Anca Dragan, Sergey Levine

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method through experiments with human participants on four tasks: reading handwritten digits, verifying photos of faces, browsing an online shopping catalogue, and playing a car racing video game. The results show that our method learns to match the user s actions with and without compression at lower bitrates than baseline methods
Researcher Affiliation	Academia	Siddharth Reddy, Anca D. Dragan, Sergey Levine University of California, Berkeley EMAIL
Pseudocode	Yes	Algorithm 1 Pragmatic Compression (PICO)
Open Source Code	No	videos are available on the project website1. 1https://sites.google.com/view/pragmatic-compression". The provided link is for a project website, not an explicit code repository or a statement confirming the release of source code.
Open Datasets	Yes	We conduct user studies on Amazon Mechanical Turk, in which we ask human participants to complete three tasks at varying bitrates: reading handwritten digits from the MNIST dataset [36], verifying attributes of faces from the Celeb A dataset [37], and browsing a shopping catalogue of cars from the LSUN Car dataset [38]. To study PICO s performance on sequential decision-making problems, we also run an experiment with 12 participants who play the Car Racing video game from Open AI Gym [39].
Dataset Splits	No	We train our discriminators and compression model on 1000 negative examples and varying numbers of positive examples, and split PICO into two rounds of batch learning and evaluation (see Appendices A.1 and A.5)." "The plots show user action agreement evaluated on 100 held-out images, with error bars representing standard error." While it mentions training and evaluating on held-out images, it doesn't provide explicit train/validation/test split percentages or counts for full reproducibility.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions several models and frameworks like Style GAN2 [34], NVAE [35], VAE, and Open AI Gym [39], but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We train our discriminators and compression model on 1000 negative examples and varying numbers of positive examples, and split PICO into two rounds of batch learning and evaluation (see Appendices A.1 and A.5)." "We train an action discriminator Dφ(a, x) to predict the likelihood p(T = 1\|a, x), using the standard binary cross-entropy loss and the training data D." "In these experiments, we ﬁx the bitrate to 85 bits per step". "The average lossless PNG ﬁle size is 0.3k B, and each image has dimensions 28x28x1. Each of the ﬁve columns in the two groups of compressed images represents a different sample from the stochastic compression model f(ˆx\|x) at bitrate 0.011.