Maximum Entropy RL (Provably) Solves Some Robust RL Problems
Authors: Benjamin Eysenbach, Sergey Levine
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation shows that, in line with our theoretical findings, simple Max Ent RL algorithms perform competitively with (and sometimes better than) recently proposed adversarial robust RL methods on benchmarks proposed by those works. |
| Researcher Affiliation | Collaboration | Benjamin Eysenbach Carnegie Mellon University, Google Brain beysenba@cs.cmu.edu; Sergey Levine UC Berkeley, Google Brain |
| Pseudocode | No | The paper contains mathematical derivations and proofs but does not include any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm' with structured steps. |
| Open Source Code | No | The paper mentions using and modifying existing open-source implementations (e.g., 'SAC implementation from TF Agents', 'modifying the open source code released by Tessler et al. (2019)') but does not state that the authors' own code for the methodology described in the paper is open-source or available. |
| Open Datasets | Yes | We used the standard Pusher-v2 task from Open AI Gym (Brockman et al., 2016). We used the Sawyer Button Press Env environment from Metaworld (Yu et al., 2020), using a maximum episode length of 151. ...four continuous control tasks from the standard Open AI Gym (Brockman et al., 2016) benchmark. |
| Dataset Splits | No | The paper describes evaluation metrics and environmental conditions for experiments but does not provide specific training/validation/test dataset splits (e.g., percentages or exact counts) or reference standard splits from the cited datasets. |
| Hardware Specification | No | The paper describes the software environments and implementations used (e.g., 'SAC implementation from TF Agents'), but it does not provide any specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions several software components and frameworks, such as 'TF Agents (Guadarrama et al., 2018)', 'Open AI Gym (Brockman et al., 2016)', and 'Metaworld (Yu et al., 2020)', but it consistently omits specific version numbers for these dependencies. |
| Experiment Setup | Yes | We used a fixed entropy coefficient of 1e-2 for the Max Ent RL results. For the standard RL results, we used the exact same codebase to avoid introducing any confounding factors, simply setting the entropy coefficient to a very small value 1e-5. ...We used 100 episodes of length 100 for evaluating each method. ...We used an entropy coefficient of 1e1 for Max Ent RL and 1e-100 for standard RL. |