Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization

Authors: Ziqi Wang, Jiashun Liu, Ling Pan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We identify two diversity-critical domains, namely multi-goal achieving and generative RL, to demonstrate the advantages of multimodal policies and our method, particularly in terms of few-shot robustness. In conventional Mu Jo Co benchmarks, our algorithm also shows competitive performance.
Researcher Affiliation Academia Ziqi Wang Jiashun Liu Ling Pan Hong Kong University of Science and Technology Correspondence to: Ling Pan (EMAIL)
Pseudocode Yes A.1 Pseudo code of Dr AC Algorithm 1 details Dr AC.
Open Source Code Yes Our code is available at https://github.com/Pneu C/Dr AC
Open Datasets Yes We build a multi-goal version of the Point Maze environment in D4RL [10] to conduct a case study of multi-goal achieving.
Dataset Splits No The paper mentions evaluating performance (e.g., "five-episode success rate"), training with "five seeds", and using specific environments like "Point Maze environment in D4RL [10]" and a "game level generation benchmark in [50]". However, it does not explicitly describe dataset splits in terms of percentages (e.g., 80% train, 10% validation, 10% test) or specific sample counts for different phases of the experiments.
Hardware Specification Yes Regarding compute resources, all of our experiments are conducted with a Linux server with 8 NVIDIA RTX 3090 GPUs and an Intel Xeon Platinum 8375C CPU.
Software Dependencies Yes All algorithms are implemented with Py Torch 2.5.1, and the CUDA version is 12.5.
Experiment Setup Yes The hyperparameters are listed in Table 2. The temperature hyperparameters listed in Table 2 are used in the multi-goal Point Maze environment and the game content generation environment. While for Mu Jo Co, we use a lower temperature for Dr AC. For baseline algorithms, we use the default temperatures Mu Jo Co provided in their original papers. Temperature hyperparameters for all algorithms in Mu Jo Co are listed in Table 3.