Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
COOPERA: Continual Open-Ended Human-Robot Assistance
Authors: Chenyang Ma, Kai Lu, Ruta Desai, Xavier Puig, Andrew Markham, Niki Trigoni
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments validate the extent to which our simulated humans reflect realistic human behaviors and demonstrate the value of inferring and personalizing to human intents for open-ended and long-term HRC. |
| Researcher Affiliation | Academia | 1University of Oxford |
| Pseudocode | No | The paper describes methods and pipelines but does not present any clearly labeled pseudocode or algorithm blocks. Figure 3 and Figure 4 are diagrams of pipelines and approaches, and Appendices F and G provide prompt details for LLMs, which are input formats rather than structured pseudocode for an algorithm. |
| Open Source Code | No | We did not provide code with the submission because of internal regulations within the authors organizations but will release it after acceptance. |
| Open Datasets | Yes | We use Habitat 3.0 [52] as the robot simulation platform and HSSD [28] as the 3D environment... For modeling unique humans, we use the SPC: Synthetic-Persona-Chat Dataset [26]... We use Motion-X [31] and AMASS [41] as the human motion dataset. |
| Dataset Splits | Yes | Two 10-way BERT-largeuncased classifiers [11] are finetuned one for intentions (10 epochs), one for tasks (20 epochs) with train-test split 0.8:0.2, learning rate 5e-6, and tested on an unseen scene. |
| Hardware Specification | Yes | We train on 3 NVIDIA A10 GPUs (24GB RAM). |
| Software Dependencies | Yes | For simulating humans, we use Llama-3.1-8B [13] with temperature 0.7. For search and memory retrieval, we use Mini LM-L6-v2 [66]... For the assistive agent, we use Llama-3.2-11B [13] as the robot-VLM. Classifiers are finetuned on Mistral-7B-Instruct-v0.2 [27] using Lo RA [22]. |
| Experiment Setup | Yes | For simulating humans, we use Llama-3.1-8B [13] with temperature 0.7. For search and memory retrieval, we use Mini LM-L6-v2 [66] with a decay factor λ = 0.95, retrieving the top 3 intentions and top 5 tasks... Classifiers are finetuned on Mistral-7B-Instruct-v0.2 [27] using Lo RA [22] (rank 8, dropout 0.2, alpha 16; targets: q, k, v, o) in an instructional format to output binary yes/no. We train for 5 epochs using Adam W [37] (lr 1e-5, weight decay 0.01), with batch size 1 and gradient accumulation of 4 steps, across 3 NVIDIA A10 GPUs (24GB RAM). |