State Chrono Representation for Enhancing Generalization in Reinforcement Learning

Authors: Jianda Chen, Wen zheng terence Ng, Zichen Chen, Sinno Pan, Tianwei Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted in Deep Mind Control and Meta-World environments demonstrate that SCR achieves better performance comparing to other recent metric-based methods in demanding generalization tasks.
Researcher Affiliation Collaboration 1Nanyang Technological University 2Continental Automotive Singapore 3University of California, Santa Barbara 4The Chinese University of Hong Kong
Pseudocode Yes Algorithm 1 A learning step in jointly learning SCR and SAC.
Open Source Code Yes The codes of SCR are available in https://github.com/jianda-chen/SCR.
Open Datasets Yes Deep Mind Control Suite. The primary objective of our proposed SCR is to develop a robust and generalizable representation for deep RL when dealing with high-dimensional observations. To evaluate its effectiveness, we conduct experiments using the Deep Mind Control Suite (DM_Control) environment, which involves rendered pixel observations [37] and a distraction setting called Distracting Control Suite [34]. ... We present experimental investigations conducted in Meta-World [44], a comprehensive simulated benchmark that includes distinct robotic manipulation tasks.
Dataset Splits Yes Specifically, the training environment samples videos from the DAVIS2017 train set, while the evaluation environment uses videos from the validation set.
Hardware Specification Yes The experiment is done on servers with an 128-cores CPU, 512 GB RAM and NVIDIA A100 GPUs.
Software Dependencies No The paper mentions implementing in SAC [13] and PPO [31] and using IQE [38], but it does not specify version numbers for general software dependencies like Python, PyTorch, or TensorFlow. For example, 'Our method, SCR, can be seamlessly integrated with a wide range of deep RL algorithms. In our implementation, we specifically employ Soft Actor-Critic (SAC) [13] as our foundation RL algorithm.'
Experiment Setup Yes Table 4: Hyperparameters: Stack frames 3, Observation shape (3 3, 84, 84), Action repeat, Convolutional layers 4, Convolutional kernal size 3 3, Convolutional strides [2, 1, 1, 1], Convolutional channels 32, ϕ dimension 256, ψ dimension 256, Learning rate 1e-4, Q function EMA αQ 0.01, Encoder ϕ EMA αϕ 0.05, Initial steps 1000, Replay buffer size 500K, Target update freq 2, Batch size 128, Discount factor γ.