SimSR: Simple Distance-Based State Representations for Deep Reinforcement Learning

Authors: Hongyu Zang, Xin Li, Mingzhong Wang8997-9005

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimented to investigate the following questions: (1) In comparison with state-of-the-art algorithms, does Sim SR have a better performance in terms of sample efficiency? (2) Can Sim SR learn robust state representation? (3) How is the generalization performance of the learned representation? Accordingly, we first evaluated our method in several standard control tasks from the Deep Mind control (DMC) suite (Tassa et al. 2018). We then evaluated the robustness of our method to test if it can handle more realistic and complicated scenarios, where we replaced the background of the environment with natural video as distractors. Finally, we tested the generalization performance of the state representation in unseen tasks.
Researcher Affiliation Academia Hongyu Zang1, Xin Li*1, Mingzhong Wang2 1 Beijing Institute of Technology 2 University of the Sunshine Coast {zanghyu,xinli}@bit.edu.cn, mwang@usc.edu.au
Pseudocode Yes Algorithm 1: Sim SR algorithm
Open Source Code Yes Our code is available at https://github.com/bit1029public/Sim SR
Open Datasets Yes Accordingly, we first evaluated our method in several standard control tasks from the Deep Mind control (DMC) suite (Tassa et al. 2018). We then evaluated the robustness of our method to test if it can handle more realistic and complicated scenarios, where we replaced the background of the environment with natural video as distractors. Finally, we tested the generalization performance of the state representation in unseen tasks.
Dataset Splits No The paper does not explicitly provide specific train/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper does not explicitly list software dependencies with specific version numbers.
Experiment Setup Yes We keep all hyper-parameters of the algorithm fixed throughout experiments except the action repeat which follows the convention to ensure a fair comparison. The settings of all hyper-parameters and architectures are also provided in Appendix.