Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning “Partner-Aware” Collaborators in Multi-Party Collaboration

Authors: Abhijnan Nath, Nikhil Krishnaswamy

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on multiple collaborative task environments show that ICR, on average, is more capable of promoting successful CG convergence and exploring more diverse solutions in such tasks.
Researcher Affiliation Academia Situated Grounding and Natural Language (SIGNAL) Lab Department of Computer Science, Colorado State University Fort Collins, CO 80523 USA EMAIL
Pseudocode Yes Algorithm 1 Expert Data Collection and ICR Agent Training
Open Source Code Yes Our code is available at https://github.com/csu-signal/ICR
Open Datasets Yes On challenging collaborative tasks such as the Deli Data Wason Card Selection task [Karadzhov et al., 2023] and the Weights Task [Khebour et al., 2024a], our approach yields substantial gains in both task performance and common ground convergence across multi-party settings.
Dataset Splits Yes training/evaluation splits for both datasets are consistent with prior work [Nath et al., 2024].
Hardware Specification Yes All models requiring an in-memory reference policy in full-press experiments were trained on two NVIDIA A100 GPUs. We use a single A100 GPU for no-press experiments. The OPT-1.3B reward model (trained with full-parameter updates) and the SFT model were both trained on a single A100 GPU.
Software Dependencies No We use Lo RA with α = 16, dropout = 0.05, rank R = 8 via PEFT11 and SFTTrainer12 from TRL, with 4-bit quantization via bitsandbytes13. We optimize with Adam W [Loshchilov and Hutter, 2017, Dettmers et al., 2024], cosine scheduler, weight decay of 0.05, and 100 warm-up steps. The text mentions software names like PEFT, TRL, and bitsandbytes but does not provide specific version numbers for them. Adam W is an optimizer but not a software library with a version number.
Experiment Setup Yes We use Lo RA with α = 16, dropout = 0.05, rank R = 8 via PEFT11 and SFTTrainer12 from TRL, with 4-bit quantization via bitsandbytes13. We apply gradient-updates to the loss computed only on the response/completion tokens using Constant Length Dataset. We optimize with Adam W [Loshchilov and Hutter, 2017, Dettmers et al., 2024], cosine scheduler, weight decay of 0.05, and 100 warm-up steps.