Offline Reinforcement Learning with Fisher Divergence Critic Regularization
Authors: Ilya Kostrikov, Rob Fergus, Jonathan Tompson, Ofir Nachum
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On standard offline RL benchmarks, Fisher-BRC achieves both improved performance and faster convergence over existing state-of-the-art methods. We then present an extensive evaluation of Fisher-BRC on standard offline RL benchmarks. |
| Researcher Affiliation | Collaboration | 1New York University, USA 2Google Research, USA 3Google Deep Mind, USA. |
| Pseudocode | Yes | Algorithm 1 Fisher-BRC [Sketch]. |
| Open Source Code | Yes | Code to reproduce our results is available at https://github.com/google-research/google-research/tree/master/fisher_brc. |
| Open Datasets | Yes | We compare our method to prior work on the Open AI Gym Mu Jo Co tasks using D4RL datasets (Fu et al., 2020). |
| Dataset Splits | No | The paper mentions using D4RL datasets and evaluates performance over 5 seeds, but it does not explicitly provide specific details about training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit references to standard splits). |
| Hardware Specification | Yes | These experiments were carried out on a Google cloud instance containing an AMD EPYC 7B12 CPU at 2.25GHz (using 8 of 64 available cores) and 32GB of RAM. |
| Software Dependencies | No | The paper mentions using a 'standard SAC implementation' and 'Adam' optimizer, but it does not specify version numbers for any programming languages, libraries, or other software components. |
| Experiment Setup | Yes | Unless otherwise noted, we set λ = 0.1 as the regularization coefficient. Our implementation for Fisher BRC follows the standard SAC implementation, only that we use a 3-layer network as in CQL. For every seed we run evaluation for 10 episodes. |