Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generative Social Choice: The Next Generation

Authors: Niclas Boehmer, Sara Fish, Ariel D. Procaccia

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present the Proportional Slate Engine (PROSE) and evaluate it in experiments. [...] We evaluate PROSE on four instances drawn from drug reviews and a deliberation hosted on Polis. [...] In each case, PROSE outperforms four baseline approaches with respect to both user satisfaction and proportionality. We present a quantitative evaluation of the generated slates in Table 1.
Researcher Affiliation Academia 1Hasso Plattner Institute, Germany 2Harvard University, USA. Correspondence to: Niclas Boehmer <EMAIL>, Sara Fish <EMAIL>.
Pseudocode Yes Algorithm 1 Democratic Process C,f(N, B, r)
Open Source Code Yes The code for PROSE and our other experiments is available at github.com/sara-fish/gen-soc-choice-next-gen.
Open Datasets Yes First, the publicly available UCI ML Drug Review dataset (Gr aßer et al., 2018) [...] Second, the Bowling Green dataset is drawn from a public deliberation hosted on Polis (2023)
Dataset Splits No From this dataset, we create three subsampled instances (each with 80 agents): Birth Control (Balanced), which contains reviews of a birth control medication with all ratings appearing equally often; Birth Control (Imbalanced), which includes only birth control reviews with extreme and central ratings, i.e., (1,2,5,9,10); and Obesity, which contains reviews on a obesity medication with all ratings appearing in equal frequency.
Hardware Specification Yes with runtimes of 31 65 minutes on a single Intel i7-8565U CPU @ 1.80GHz.
Software Dependencies Yes PROSE leverages GPT-4o when answering discriminative or generative queries. [...] We embed each agent using their description via Open AI s embedding-3-large.
Experiment Setup Yes In particular, for the three drug review instances, we use C = [80, 70, 60, 50, 40, 36, 32, 28, 24, 20, 16, 12, 10, 8, 6, 4, 2], while for bowlinggreen which has a different word budget per agent, we use C = [80, 60, 40, 36, 32, 28, 24, 20, 16, 12, 8, 4]. Approval Levels We use β„“= [5.5, 5, 4.5, 4, 3.5, 3, 2, 1, 0] for each of the instances.