$\mathrm{SO}(2)$-Equivariant Reinforcement Learning
Authors: Dian Wang, Robin Walters, Robert Platt
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments that demonstrate that our equivariant versions of DQN and SAC can be significantly more sample efficient than competing algorithms on an important class of robotic manipulation problems. |
| Researcher Affiliation | Academia | Dian Wang, Robin Walters, and Robert Platt Khoury College of Computer Sciences Northeastern University Boston, MA 02115, USA {wang.dian, r.walters, r.platt}@northeastern.edu |
| Pseudocode | No | The paper does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Supplementary video and code are available at https://pointw.github.io/equi_rl_page/. |
| Open Datasets | No | The paper describes experimental environments implemented in the Py Bullet simulator and mentions pre-populating replay buffers with expert demonstrations, but it does not provide a public link, DOI, or formal citation for a static, downloadable dataset used for training or for the expert demonstrations. |
| Dataset Splits | No | The paper details training and evaluation procedures but does not explicitly mention distinct validation dataset splits or methodology (e.g., specific percentages or counts for a validation set) beyond general training/testing cycles. |
| Hardware Specification | No | The paper mentions using Py Bullet simulator and running 5 parallel environments but does not specify any particular hardware components such as GPU or CPU models, or memory. |
| Software Dependencies | No | The paper mentions using 'E2CNN library with Py Torch' and 'Py Bullet simulator' but does not provide specific version numbers for these or other software dependencies necessary for replication. |
| Experiment Setup | Yes | In the DQN experiments, we use the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 10^-4. We use Huber loss (Huber, 1964) for calculating the TD loss. We use a discount factor γ = 0.95. The batch size is 32. The buffer has a capacity of 100,000 transitions. In the SAC (and SACfD) experiments, we use the Adam optimizer with a learning rate of 10^-3. The entropy temperature α is initialized at 10^-2. The target entropy is -5. The discount factor γ = 0.99. The batch size is 64. The buffer has a capacity of 100,000 transitions. |