reproducibilityindex.ai

Wasserstein Unsupervised Reinforcement Learning

Authors: Shuncheng He, Yuhang Jiang, Hongchang Zhang, Jianzhun Shao, Xiangyang Ji6884-6892

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show policies learned by our approach outperform MI-based methods on the metric of Wasserstein distance while keeping high discriminability. Furthermore, the agents trained by WURL can sufficiently explore the state space in mazes and Mu Jo Co tasks and the pre-trained policies can be applied to downstream tasks by hierarchical learning.
Researcher Affiliation	Academia	Shuncheng He, Yuhang Jiang, Hongchang Zhang, Jianzhun Shao, Xiangyang Ji Tsinghua University hesc16@mails.tsinghua.edu.cn
Pseudocode	Yes	Algorithm 1: Naive WURL (test function), Algorithm 2: Naive WURL (final reward), Algorithm 3: Amortized Reward Crediting
Open Source Code	No	The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets	Yes	We also examine our algorithms on several Mu Jo Co tasks, including three classical locomotion environments, and two customized point mass environments where a simplified ball agent wanders in different maps with various landscapes and movable objects (see Appendix for demonstrations).
Dataset Splits	No	The paper does not explicitly provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing. For environments like MuJoCo, data is typically generated through interaction rather than pre-split datasets.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions using Soft-Actor-Critic (SAC) as its backend RL algorithm but does not provide specific version numbers for any software dependencies, libraries, or programming languages.
Experiment Setup	No	The paper discusses algorithmic details and training procedures (e.g., 'train πθl with SAC', 'Train test functions'), but it does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size, number of epochs), optimizer settings, or model initialization details in the main text.