reproducibilityindex.ai

State Chrono Representation for Enhancing Generalization in Reinforcement Learning

Authors: Jianda Chen, Wen zheng terence Ng, Zichen Chen, Sinno Pan, Tianwei Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments conducted in Deep Mind Control and Meta-World environments demonstrate that SCR achieves better performance comparing to other recent metric-based methods in demanding generalization tasks.
Researcher Affiliation	Collaboration	1Nanyang Technological University 2Continental Automotive Singapore 3University of California, Santa Barbara 4The Chinese University of Hong Kong
Pseudocode	Yes	Algorithm 1 A learning step in jointly learning SCR and SAC.
Open Source Code	Yes	The codes of SCR are available in https://github.com/jianda-chen/SCR.
Open Datasets	Yes	Deep Mind Control Suite. The primary objective of our proposed SCR is to develop a robust and generalizable representation for deep RL when dealing with high-dimensional observations. To evaluate its effectiveness, we conduct experiments using the Deep Mind Control Suite (DM_Control) environment, which involves rendered pixel observations [37] and a distraction setting called Distracting Control Suite [34]. ... We present experimental investigations conducted in Meta-World [44], a comprehensive simulated benchmark that includes distinct robotic manipulation tasks.
Dataset Splits	Yes	Specifically, the training environment samples videos from the DAVIS2017 train set, while the evaluation environment uses videos from the validation set.
Hardware Specification	Yes	The experiment is done on servers with an 128-cores CPU, 512 GB RAM and NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions implementing in SAC [13] and PPO [31] and using IQE [38], but it does not specify version numbers for general software dependencies like Python, PyTorch, or TensorFlow. For example, 'Our method, SCR, can be seamlessly integrated with a wide range of deep RL algorithms. In our implementation, we specifically employ Soft Actor-Critic (SAC) [13] as our foundation RL algorithm.'
Experiment Setup	Yes	Table 4: Hyperparameters: Stack frames 3, Observation shape (3 3, 84, 84), Action repeat, Convolutional layers 4, Convolutional kernal size 3 3, Convolutional strides [2, 1, 1, 1], Convolutional channels 32, ϕ dimension 256, ψ dimension 256, Learning rate 1e-4, Q function EMA αQ 0.01, Encoder ϕ EMA αϕ 0.05, Initial steps 1000, Replay buffer size 500K, Target update freq 2, Batch size 128, Discount factor γ.