ODE-based Recurrent Model-free Reinforcement Learning for POMDPs

Authors: Xuanle Zhao, Duzhen Zhang, Han Liyuan, Tielin Zhang, Bo Xu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks.
Researcher Affiliation Academia 1Institute of Automation, Chinese Academy of Sciences, Beijing, China 2School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
Pseudocode Yes Algorithm 1 The GRU-ODE algorithm. Input: Observations and time difference between observations (xt, dt)t=1..T h0 = 0 for t in 1, 2, ..., T do ht = GRUCell (ht 1, xt) {Update hidden state} ht = ODESolve fθ, ht, dt {Solve ODE} zt = MLP(ht) for all t = 1..T Return: {zt}t=1..T ; ht
Open Source Code No The paper does not provide an explicit statement or link to the open-source code for the methodology described in this paper.
Open Datasets Yes In regular observation domains, we consider conventional partially observable control and meta-RL tasks by employing Mu Jo Co [Todorov et al., 2012] and Py Bullet [Greff et al., 2022] environments.
Dataset Splits No The paper uses standard reinforcement learning environments (MuJoCo, PyBullet) for training and evaluation. It does not provide specific dataset splits for training, validation, and testing with percentages or sample counts, as is common for fixed datasets.
Hardware Specification Yes We train these methods on a server with NVIDIA TITAN Xp and Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz as GPU and CPU respectively.
Software Dependencies No The paper states 'We use the Py Torch framework for our experiments,' but it does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes We use the Py Torch framework for our experiments. Some basic hyperparameters about the network architectures are listed below [...] Table 2: Hyperparameters [...] Table 3: Hyperparameters of SAC and TD3