Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Value of Prediction in Identifying the Worst-Off

Authors: Unai Fischer-Abaigar, Christoph Kern, Juan Carlos Perdomo

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through mathematical models and a real-world case study on long-term unemployment amongst German residents, we develop a comprehensive understanding of the relative effectiveness of prediction in surfacing the worst-off. Our findings provide clear analytical frameworks and practical, data-driven tools that empower policymakers to make principled decisions when designing these systems. (...) We complement our theoretical discussion by presenting a methodology for policymakers to evaluate the prediction-access ratio in practice. Using a real-world administrative dataset on hundreds of thousands of jobseekers in Germany, we show that our theoretical findings generalize to a more complex, real-world context (...) We train a Cat Boost model (see Appendix B.2 for details), achieving an R2 of 0.15 on the test set.
Researcher Affiliation Academia 1Department of Statistics, University of Munich (LMU), Munich, Germany 2Munich Center for Machine Learning, Germany 3Harvard University, Boston, MA, US. Correspondence to: Unai Fischer-Abaigar <Unai.Fischer EMAIL>.
Pseudocode No The paper describes mathematical models and theoretical concepts in text and equations, and outlines an experimental setup, but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code No The paper states: "We use Cat Boost (https://catboost.ai) for model training." This refers to a third-party tool used by the authors, not their own source code for the methodology described in the paper. There is no explicit statement or link indicating that the authors' implementation code is open-sourced.
Open Datasets Yes We secured access to a dataset on German jobseekers derived from German administrative labor market records that cover a large portion of the German labor force. It merges multiple administrative data sources, containing a wide spectrum of individual labor market information including records on employment histories, received benefits, unemployment periods, participation in job training programs and demographic information. Such administrative records are the primary data source used by PES to build algorithmic profiling models (Bach et al., 2023). (...) The dataset is provided via a Scientific Use File by the Research Data Centre (FDZ) of the German Federal Employment Agency (BA) at the Institute for Employment Research (IAB) (Schmucker & vom Berge, 2023a;b).
Dataset Splits Yes To avoid the impact of significant labor market reforms in Germany and to ensure full observation of unemployment durations up to 24 months, we restrict our analysis to unemployment episodes that began between 2010 and 2015. We use records from 2010 and 2011 to build the training dataset, records from 2012 for validation, and evaluate test performance on data from 2015 (see Figure 9).
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as particular GPU or CPU models, or cloud computing instance types. It only mentions general computational infrastructure.
Software Dependencies No The paper mentions: "We use Cat Boost (https://catboost.ai) for model training." and "Additionally, we train a shallow Decision Tree (max depth = 4) using the scikit-learn package." However, it does not specify version numbers for Cat Boost or scikit-learn, which are necessary for reproducible dependency information.
Experiment Setup Yes We use Cat Boost (https://catboost.ai) for model training. The model was trained for a maximum 5,000 iterations with an early stopping criterion (early stopping rounds = 20) based on validation performance. Additionally, we train a shallow Decision Tree (max depth = 4) using the scikit-learn package. All hyperparameters are kept at their default settings unless otherwise specified.