reproducibilityindex.ai

ODE-based Recurrent Model-free Reinforcement Learning for POMDPs

Authors: Xuanle Zhao, Duzhen Zhang, Han Liyuan, Tielin Zhang, Bo Xu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks.
Researcher Affiliation	Academia	1Institute of Automation, Chinese Academy of Sciences, Beijing, China 2School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
Pseudocode	Yes	Algorithm 1 The GRU-ODE algorithm. Input: Observations and time difference between observations (xt, dt)t=1..T h0 = 0 for t in 1, 2, ..., T do ht = GRUCell (ht 1, xt) {Update hidden state} ht = ODESolve fθ, ht, dt {Solve ODE} zt = MLP(ht) for all t = 1..T Return: {zt}t=1..T ; ht
Open Source Code	No	The paper does not provide an explicit statement or link to the open-source code for the methodology described in this paper.
Open Datasets	Yes	In regular observation domains, we consider conventional partially observable control and meta-RL tasks by employing Mu Jo Co [Todorov et al., 2012] and Py Bullet [Greff et al., 2022] environments.
Dataset Splits	No	The paper uses standard reinforcement learning environments (MuJoCo, PyBullet) for training and evaluation. It does not provide specific dataset splits for training, validation, and testing with percentages or sample counts, as is common for fixed datasets.
Hardware Specification	Yes	We train these methods on a server with NVIDIA TITAN Xp and Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz as GPU and CPU respectively.
Software Dependencies	No	The paper states 'We use the Py Torch framework for our experiments,' but it does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	We use the Py Torch framework for our experiments. Some basic hyperparameters about the network architectures are listed below [...] Table 2: Hyperparameters [...] Table 3: Hyperparameters of SAC and TD3