State Chrono Representation for Enhancing Generalization in Reinforcement Learning
Authors: Jianda Chen, Wen zheng terence Ng, Zichen Chen, Sinno Pan, Tianwei Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted in Deep Mind Control and Meta-World environments demonstrate that SCR achieves better performance comparing to other recent metric-based methods in demanding generalization tasks. |
| Researcher Affiliation | Collaboration | 1Nanyang Technological University 2Continental Automotive Singapore 3University of California, Santa Barbara 4The Chinese University of Hong Kong |
| Pseudocode | Yes | Algorithm 1 A learning step in jointly learning SCR and SAC. |
| Open Source Code | Yes | The codes of SCR are available in https://github.com/jianda-chen/SCR. |
| Open Datasets | Yes | Deep Mind Control Suite. The primary objective of our proposed SCR is to develop a robust and generalizable representation for deep RL when dealing with high-dimensional observations. To evaluate its effectiveness, we conduct experiments using the Deep Mind Control Suite (DM_Control) environment, which involves rendered pixel observations [37] and a distraction setting called Distracting Control Suite [34]. ... We present experimental investigations conducted in Meta-World [44], a comprehensive simulated benchmark that includes distinct robotic manipulation tasks. |
| Dataset Splits | Yes | Specifically, the training environment samples videos from the DAVIS2017 train set, while the evaluation environment uses videos from the validation set. |
| Hardware Specification | Yes | The experiment is done on servers with an 128-cores CPU, 512 GB RAM and NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions implementing in SAC [13] and PPO [31] and using IQE [38], but it does not specify version numbers for general software dependencies like Python, PyTorch, or TensorFlow. For example, 'Our method, SCR, can be seamlessly integrated with a wide range of deep RL algorithms. In our implementation, we specifically employ Soft Actor-Critic (SAC) [13] as our foundation RL algorithm.' |
| Experiment Setup | Yes | Table 4: Hyperparameters: Stack frames 3, Observation shape (3 3, 84, 84), Action repeat, Convolutional layers 4, Convolutional kernal size 3 3, Convolutional strides [2, 1, 1, 1], Convolutional channels 32, ϕ dimension 256, ψ dimension 256, Learning rate 1e-4, Q function EMA αQ 0.01, Encoder ϕ EMA αϕ 0.05, Initial steps 1000, Replay buffer size 500K, Target update freq 2, Batch size 128, Discount factor γ. |