Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks

Authors: Luise Ge, Michael Lanier, Anindya Sarkar, Bengisu Guresti, Chongjie Zhang, Yevgeniy Vorobeychik

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on Mu Jo Co and Meta-World show that the proposed approach outperforms state-of-the-art multi-task, meta-, and task clustering baselines in training, generalization, and few-shot learning, often by a large margin.
Researcher Affiliation	Academia	1Department of Computer Science & Engineering, Washington University in St. Louis. Correspondence to: Luise Ge <EMAIL>.
Pseudocode	Yes	The full pseudocode for the Greedy Intersection Algorithm (GIA) algorithm is provided as Algorithm 1. Algorithm 1 Greedy Intersection Input: T = {θi}N i=1, ϵ > 0, K 1 Output: Parameter cover C
Open Source Code	Yes	Our code is available at https://github.com/CERL-WUSTL/PACMAN/.
Open Datasets	Yes	Our experiments on Mu Jo Co and Meta-World show that the proposed approach outperforms state-of-the-art multi-task, meta-, and task clustering baselines in training, generalization, and few-shot learning, often by a large margin.
Dataset Splits	Yes	Mu Jo Co We selected two commonly used Mu Jo Co environments... use 100 tasks for training and another 100 for testing (in both zero-shot and few-shot settings)... Meta-World We focus on the set of robotic manipulation tasks in MT50, of which we use 30 for training and 20 for testing.
Hardware Specification	Yes	To illustrate, our Meta World experiments show that training a single policy for 1 million steps necessitates approximately 40 hours using an A40 GPU.
Software Dependencies	No	The text references a specific LLM model, Phi-3 Mini-128k Instruct (Microsoft, 2024), but does not specify programming languages, libraries, or frameworks with version numbers used for the implementation of the proposed method.
Experiment Setup	Yes	For clustering, we use K = 3, ϵ = .6, and use the gradient-based approach initialized with the result of the Greedy Intersection algorithm. For few-shot learning, we fine-tune all methods for 100 epochs. Meta-World ... We use K = 3 and ϵ = .7. Performance is a moving average success rate for the last 2000 evaluation episodes over 3 seeds.