Spectrum Random Masking for Generalization in Image-based Reinforcement Learning

Authors: Yangru Huang, Peixi Peng, Yifan Zhao, Guangyao Chen, Yonghong Tian

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on DMControl Generalization Benchmark demonstrate the proposed SRM achieves the state-of-the-art performance with strong generalization potentials.
Researcher Affiliation Academia 1School of Computer Science, Peking University, Beijing, China 2Peng Cheng Laboratory, Shen Zhen, China
Pseudocode Yes Algorithm 1 Spectrum Random Masking
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material.
Open Datasets Yes We conduct our experiment on 5 tasks from Deep Mind Control Suite (DMControl) [34]
Dataset Splits No No explicit mention of validation dataset splits or percentages (e.g., 'X% for validation') was found. The paper mentions training on DMControl and testing on DMControl Generalization Benchmark, implying a train/test split but not a validation split.
Hardware Specification No The paper states 'See supplemental material' for hardware specifications, but these details are not present in the provided main paper text.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as libraries, frameworks, or programming languages (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup Yes For a fair comparison, we implement all methods following [13], where the same hyperparameters and network architecture are adopted. We use a 11-layer feed-forward convolution network as the shared encoder, which is followed by independent linear projections for the actor and critic. During training, the masking ratio and position of SRM are randomly chosen, and the ranges of r1 and r are set as [0, 0.5] and [0, 0.05] for each batch of observations, respectively.