RRL: Resnet as representation for Reinforcement Learning
Authors: Rutav M Shah, Vikash Kumar
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In a simulated dexterous manipulation benchmark, where the state of the art methods fails to make significant progress, RRL delivers contact rich behaviors. Our experimental evaluations aims to address the following questions: (1) Does pre-tained representations acquired via large real world image dataset allow RRL to learn complex tasks directly from proprioceptive signals (camera inputs and joint encoders)? (2) How does RRL s performance and efficiency compare against other state-of-the-art methods? (3) How various representational choices influence the generality and versatility of the resulting behaviors? (5) What are the effects of various design decisions on RRL? (6) Are commonly used benchmarks for studying image based continuous control methods effective? |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, India 2Department of Computer Science, University of Washington, Seattle, USA. |
| Pseudocode | Yes | Algorithm 1 RRL |
| Open Source Code | No | The paper does not explicitly state that its own source code is released or provide a link to it. It mentions other projects and their code repositories (e.g., 'Yarats, D. and Kostrikov, I. Soft actor-critic (sac) implementation in pytorch. https://github.com/ denisyarats/pytorch_sac, 2020.' or 'Subramanian, A. Pytorch-vae. https://github.com/ Antix K/Py Torch-VAE, 2020.'). |
| Open Datasets | Yes | We use standard Resnet-34 model as RRL s feature extractor. The model is pre-trained on the Image Net dataset which consists of 1000 classes. It is trained on 1.28 million images on the classification task of Image Net. |
| Dataset Splits | No | The paper refers to using the ResNet model pre-trained on ImageNet but does not explicitly state the train/validation/test splits used for its own experiments on the ADROIT or DMControl suites. It refers to 'samples(M)' and 'Robot Hours' for performance evaluation, but not specific dataset splits for reproduction. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments (e.g., GPU models, CPU models, memory, or cloud instances). It mentions 'Robot Hours' as a measure of compute time but no underlying hardware. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, TensorFlow, or other libraries used in the implementation. |
| Experiment Setup | Yes | All the hyperparameters used for training are summarized in Appendix(Table 2). |