Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Prices, Bids, Values: One ML-Powered Combinatorial Auction to Rule Them All

Authors: Ermis Soumalias, Jakob Heiss, Jakob Weissteiner, Sven Seuken

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show via experiments that combining both query types results in significantly better learning performance in practice. Building on these insights, we present MLHCA, a new ML-powered auction that uses value and demand queries. MLHCA significantly outperforms the previous SOTA, reducing efficiency loss by up to a factor 10, with up to 58% fewer queries. Thus, MLHCA achieves large efficiency improvements while also reducing bidders cognitive load, establishing a new benchmark for both practicability and efficiency.
Researcher Affiliation	Collaboration	1Department of Informatics, University of Zurich, Zurich, Switzerland 2ETH AI Center, Zurich, Switzerland 3Department of Mathematics, ETH Zurich, Zurich, Switzerland 4Department of Statistics, University of California, Berkeley, USA 5UBS, Zurich, Switzerland. Correspondence to: Ermis Soumalias <EMAIL>, Jakob Heiss <EMAIL>.
Pseudocode	Yes	Algorithm 1: MLHCA(QCCA, QDQ, QVQ, π)... Algorithm 2: NEXTQUERIES(I, R) (Brero et al. 2021)... Algorithm 3: MLCA(Qinit, Qmax, Qround) (Brero et al. 2021)... Algorithm 4: MIXEDTRAINING
Open Source Code	Yes	Our code is available at https://github.com/marketdesignresearch/MLHCA.
Open Datasets	Yes	To generate synthetic CA instances, we use the spectrum auction test suite (SATS) (Weiss et al., 2017), which includes various value models (domains) designed to simulate different auction environments. Following standard practice in this line of research (e.g., Soumalias et al. (2024c); Weissteiner et al. (2023)), we conduct experiments on the GSVM, LSVM, SRVM, and MRVM domains (see Appendix G.1 for details).
Dataset Splits	Yes	For this bidder, we generate three training sets: (1) 40 DQs simulating 40 CCA clock rounds and 20 random VQs, (2) 60 DQs simulating 60 clock rounds with no VQs, and (3) 60 random VQs with no DQs. The models are evaluated on two validation sets: a random bundle set (Vr) with 50,000 uniformly sampled bundles, and a price-driven set (Vp) containing bundles requested under 200 random price vectors. ...The selected configurations are then tested on 10 new bidders, generating hold-out tests sets Tr and Tp in the same way as Vp and Vr.
Hardware Specification	Yes	All experiments were conducted on a compute cluster running Debian GNU/Linux 10 with Intel Xeon E5-2650 v4 2.20GHz processors with 24 cores and 128GB RAM and Intel E5 v2 2.80GHz processors with 20 cores and 128GB RAM and Python 3.8.10.
Software Dependencies	Yes	All experiments were conducted on a compute cluster running Debian GNU/Linux 10 with Intel Xeon E5-2650 v4 2.20GHz processors with 24 cores and 128GB RAM and Intel E5 v2 2.80GHz processors with 20 cores and 128GB RAM and Python 3.8.10.
Experiment Setup	Yes	We conduct the following experiment: We perform hyperparameter optimization (HPO) to train an MVNN for the most critical bidder in the most realistic simulation domain (see Appendix G.1 for details on the simulation and Appendix E.3 for results for other domains). For this bidder, we generate three training sets: (1) 40 DQs simulating 40 CCA clock rounds and 20 random VQs, (2) 60 DQs simulating 60 clock rounds with no VQs, and (3) 60 random VQs with no DQs. ...For MLHCA s VQ rounds, we performed HPO separately for each bidder type in each domain, as detailed in Appendix E.2. For the DQ rounds, we adopted the HPO parameters reported by Soumalias et al. (2024c), since our learning algorithm, when restricted to DQs, is equivalent to theirs.