Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Ad Hoc Teamwork via Offline Goal-Based Decision Transformers

Authors: Xinzhi Zhang, Hohei Chan, Deheng Ye, Yi Cai, Mengchen Zhao

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results show that TAGET significantly outperforms existing solutions to AHT in the offline setting. (...) 5. Experiments
Researcher Affiliation	Collaboration	1School of Software Engineering, South China University of Technology, Guangzhou, China 2Tencent, Shenzhen, China. Correspondence to: Mengchen Zhao <EMAIL>.
Pseudocode	Yes	D. Pseudocode of Algorithm Algorithm 1 demonstrates our trajectory mirroring strategy for pre-processing the offline dataset. Algorithm 2 demonstrates the offline training process of TAGET. Algorithm 3 illustrates the online testing process of TAGET.
Open Source Code	No	The paper does not provide explicit statements or links regarding the availability of open-source code for the described methodology.
Open Datasets	No	To train our model in an offline setting, we utilize precollected interaction trajectories. To ensure the model s adaptability to diverse teammate strategies, we adopt the Soft-Value Diversity (SVD) method proposed in CSP (Ding et al., 2023) to collect data.
Dataset Splits	Yes	We trained four distinct populations of multi-agent reinforcement learning (MARL) policies for each environment. From these, one population was randomly sampled as the testing teammate set, while the remaining three were used to collect interaction trajectories for the offline dataset.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using Adam W optimizer, but does not provide specific software dependencies (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	Our model adopts the Decision Transformer (DT) backbone, with the following configurations: an embedding dimension of 64, context window length K = 30, 2 transformer layers with 1 attention head each, ReLU activation, and a dropout rate of 0.3. The network is optimized using Adam W with a learning rate of 0.01, batch size of 2048, and a weight decay of 0.0001. There are several task-specific coefficients to balance different learning objectives in our training loss. Specifically, we set the weighting parameters as follows: α = 0.0001, β = 100, γ = 100, and σ = 0.001