Wasserstein Unsupervised Reinforcement Learning

Authors: Shuncheng He, Yuhang Jiang, Hongchang Zhang, Jianzhun Shao, Xiangyang Ji6884-6892

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show policies learned by our approach outperform MI-based methods on the metric of Wasserstein distance while keeping high discriminability. Furthermore, the agents trained by WURL can sufficiently explore the state space in mazes and Mu Jo Co tasks and the pre-trained policies can be applied to downstream tasks by hierarchical learning.
Researcher Affiliation Academia Shuncheng He, Yuhang Jiang, Hongchang Zhang, Jianzhun Shao, Xiangyang Ji Tsinghua University hesc16@mails.tsinghua.edu.cn
Pseudocode Yes Algorithm 1: Naive WURL (test function), Algorithm 2: Naive WURL (final reward), Algorithm 3: Amortized Reward Crediting
Open Source Code No The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes We also examine our algorithms on several Mu Jo Co tasks, including three classical locomotion environments, and two customized point mass environments where a simplified ball agent wanders in different maps with various landscapes and movable objects (see Appendix for demonstrations).
Dataset Splits No The paper does not explicitly provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing. For environments like MuJoCo, data is typically generated through interaction rather than pre-split datasets.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions using Soft-Actor-Critic (SAC) as its backend RL algorithm but does not provide specific version numbers for any software dependencies, libraries, or programming languages.
Experiment Setup No The paper discusses algorithmic details and training procedures (e.g., 'train πθl with SAC', 'Train test functions'), but it does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size, number of epochs), optimizer settings, or model initialization details in the main text.