Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs

Authors: Tianwei Ni, Benjamin Eysenbach, Ruslan Salakhutdinov

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments aim to answer two questions. First, how does a well-tuned implementation of recurrent model-free RL compare to specialized POMDP methods? To give these prior methods the strongest possible footing, we perform the comparison on the benchmarks used by these prior methods. Second, which design decisions are essential for recurrent model-free RL? We put the environment details in App. D.
Researcher Affiliation Academia Tianwei Ni 1 Benjamin Eysenbach 2 Ruslan Salakhutdinov 2 1Universit e de Montr eal & Mila Quebec AI Institute 2Carnegie Mellon University. Correspondence to: Tianwei Ni <EMAIL>, Benjamin Eysenbach <EMAIL>.
Pseudocode No The paper does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We also release a simple and efficient implementation of recurrent model-free RL for future work to use as a baseline for POMDPs.
Open Datasets Yes We adopt the occlusion benchmark proposed by VRM, replace the deprecated roboschool with Py Bullet (Coumans & Bai, 2016) as suggested by the official github repository2. We follow the practice in VRM (Han et al., 2020) in the other aspects of environment design, i.e. we remove all the position/angle-related entries in the observation space for -V environments and velocity-related entries for -P environments, to transform the original MDP into POMDP.
Dataset Splits No The paper mentions training and testing tasks and environments but does not provide explicit details on data splitting (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification Yes The computer system we used during the experiments includes a Ge Force RTX 2080 Ti Graphic Card (with 11GB memory) and Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (with 250GB RAM and 80 cores).
Software Dependencies No The paper mentions software like Py Bullet and Stable Baseline3, but it does not specify version numbers for these or other core software dependencies (e.g., Python, PyTorch/TensorFlow) that are crucial for reproducibility.
Experiment Setup Yes Table 5: Hyperparameter summary in our implementation of model-free recurrent RL. For each benchmark, we report the hidden layer size of each module, RL and training hyperparameters. For meta-RL, we take the model on Cheetah-Vel as example, which follows the architecture design of off-policy vari BAD (Dorfman et al., 2020). The hidden size of observation-action embedder is the sum of that of observation embedder, previous action embedder (if exists), and reward embedder (if exists).