Spectrum Random Masking for Generalization in Image-based Reinforcement Learning
Authors: Yangru Huang, Peixi Peng, Yifan Zhao, Guangyao Chen, Yonghong Tian
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted on DMControl Generalization Benchmark demonstrate the proposed SRM achieves the state-of-the-art performance with strong generalization potentials. |
| Researcher Affiliation | Academia | 1School of Computer Science, Peking University, Beijing, China 2Peng Cheng Laboratory, Shen Zhen, China |
| Pseudocode | Yes | Algorithm 1 Spectrum Random Masking |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material. |
| Open Datasets | Yes | We conduct our experiment on 5 tasks from Deep Mind Control Suite (DMControl) [34] |
| Dataset Splits | No | No explicit mention of validation dataset splits or percentages (e.g., 'X% for validation') was found. The paper mentions training on DMControl and testing on DMControl Generalization Benchmark, implying a train/test split but not a validation split. |
| Hardware Specification | No | The paper states 'See supplemental material' for hardware specifications, but these details are not present in the provided main paper text. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as libraries, frameworks, or programming languages (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | For a fair comparison, we implement all methods following [13], where the same hyperparameters and network architecture are adopted. We use a 11-layer feed-forward convolution network as the shared encoder, which is followed by independent linear projections for the actor and critic. During training, the masking ratio and position of SRM are randomly chosen, and the ranges of r1 and r are set as [0, 0.5] and [0, 0.05] for each batch of observations, respectively. |