Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks
Authors: Luise Ge, Michael Lanier, Anindya Sarkar, Bengisu Guresti, Chongjie Zhang, Yevgeniy Vorobeychik
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on Mu Jo Co and Meta-World show that the proposed approach outperforms state-of-the-art multi-task, meta-, and task clustering baselines in training, generalization, and few-shot learning, often by a large margin. |
| Researcher Affiliation | Academia | 1Department of Computer Science & Engineering, Washington University in St. Louis. Correspondence to: Luise Ge <EMAIL>. |
| Pseudocode | Yes | The full pseudocode for the Greedy Intersection Algorithm (GIA) algorithm is provided as Algorithm 1. Algorithm 1 Greedy Intersection Input: T = {θi}N i=1, ϵ > 0, K 1 Output: Parameter cover C |
| Open Source Code | Yes | Our code is available at https://github.com/CERL-WUSTL/PACMAN/. |
| Open Datasets | Yes | Our experiments on Mu Jo Co and Meta-World show that the proposed approach outperforms state-of-the-art multi-task, meta-, and task clustering baselines in training, generalization, and few-shot learning, often by a large margin. |
| Dataset Splits | Yes | Mu Jo Co We selected two commonly used Mu Jo Co environments... use 100 tasks for training and another 100 for testing (in both zero-shot and few-shot settings)... Meta-World We focus on the set of robotic manipulation tasks in MT50, of which we use 30 for training and 20 for testing. |
| Hardware Specification | Yes | To illustrate, our Meta World experiments show that training a single policy for 1 million steps necessitates approximately 40 hours using an A40 GPU. |
| Software Dependencies | No | The text references a specific LLM model, Phi-3 Mini-128k Instruct (Microsoft, 2024), but does not specify programming languages, libraries, or frameworks with version numbers used for the implementation of the proposed method. |
| Experiment Setup | Yes | For clustering, we use K = 3, ϵ = .6, and use the gradient-based approach initialized with the result of the Greedy Intersection algorithm. For few-shot learning, we fine-tune all methods for 100 epochs. Meta-World ... We use K = 3 and ϵ = .7. Performance is a moving average success rate for the last 2000 evaluation episodes over 3 seeds. |