Domain Adaptation In Reinforcement Learning Via Latent Unified State Representation

Authors: Jinwei Xing, Takashi Nagata, Kexin Chen, Xinyun Zou, Emre Neftci, Jeffrey L. Krichmar10452-10459

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results show that this approach can achieve state-of-the-art domain adaptation performance in related RL tasks and outperforms prior approaches based on latent-representation based RL and image-to-image translation.
Researcher Affiliation Academia Jinwei Xing1, Takashi Nagata2, Kexin Chen1, Xinyun Zou2, Emre Neftci 1, 2 , Jeffrey L. Krichmar1, 2 1Department of Cognitive Sciences 2Department of Computer Science University of California,Irvine, USA {jinweix1,takashin,kexinc3,xinyunz5,eneftci,jkrichma}@uci.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes In this work, we use Ray RLlib (Liang et al. 2018) and RLCodebase (Xing 2020) for the PPO implementation. ... Xing, J. 2020. RLCodebase: Py Torch Codebase For Deep Reinforcement Learning Algorithms. https://github.com/ Karl Xing/RLCodebase.
Open Datasets No The paper describes collecting images from Car Racing and CARLA simulators for its experiments, but does not provide concrete access information (link, DOI, repository, or formal citation) for these specific datasets to be publicly available.
Dataset Splits No The paper discusses training and testing, and evaluation on 'seen target domains' and 'unseen target domains' but does not specify explicit training/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions software like Ray RLlib, RLCodebase, and a gym wrapper for CARLA, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes After that, we train the RL agent in the source domain with LUSR for 10 millions steps via Proximal Policy Optimization (PPO) (Schulman et al. 2017) algorithm. ... we set the number of PPO training steps in this experiment as 50k. ... The action space is composed of two continuous values for driving control (throttle and steering). At each step, the driving control is applied on the vehicle for 0.1 simulation second. ... Each episode terminates if the vehicle collides, runs out of the lane, reaches the destination, or reaches the maximum episode timesteps (800 in this experiment).