Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization
Authors: Ziqi Wang, Jiashun Liu, Ling Pan
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We identify two diversity-critical domains, namely multi-goal achieving and generative RL, to demonstrate the advantages of multimodal policies and our method, particularly in terms of few-shot robustness. In conventional Mu Jo Co benchmarks, our algorithm also shows competitive performance. |
| Researcher Affiliation | Academia | Ziqi Wang Jiashun Liu Ling Pan Hong Kong University of Science and Technology Correspondence to: Ling Pan (EMAIL) |
| Pseudocode | Yes | A.1 Pseudo code of Dr AC Algorithm 1 details Dr AC. |
| Open Source Code | Yes | Our code is available at https://github.com/Pneu C/Dr AC |
| Open Datasets | Yes | We build a multi-goal version of the Point Maze environment in D4RL [10] to conduct a case study of multi-goal achieving. |
| Dataset Splits | No | The paper mentions evaluating performance (e.g., "five-episode success rate"), training with "five seeds", and using specific environments like "Point Maze environment in D4RL [10]" and a "game level generation benchmark in [50]". However, it does not explicitly describe dataset splits in terms of percentages (e.g., 80% train, 10% validation, 10% test) or specific sample counts for different phases of the experiments. |
| Hardware Specification | Yes | Regarding compute resources, all of our experiments are conducted with a Linux server with 8 NVIDIA RTX 3090 GPUs and an Intel Xeon Platinum 8375C CPU. |
| Software Dependencies | Yes | All algorithms are implemented with Py Torch 2.5.1, and the CUDA version is 12.5. |
| Experiment Setup | Yes | The hyperparameters are listed in Table 2. The temperature hyperparameters listed in Table 2 are used in the multi-goal Point Maze environment and the game content generation environment. While for Mu Jo Co, we use a lower temperature for Dr AC. For baseline algorithms, we use the default temperatures Mu Jo Co provided in their original papers. Temperature hyperparameters for all algorithms in Mu Jo Co are listed in Table 3. |