ODE-based Recurrent Model-free Reinforcement Learning for POMDPs
Authors: Xuanle Zhao, Duzhen Zhang, Han Liyuan, Tielin Zhang, Bo Xu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks. |
| Researcher Affiliation | Academia | 1Institute of Automation, Chinese Academy of Sciences, Beijing, China 2School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China |
| Pseudocode | Yes | Algorithm 1 The GRU-ODE algorithm. Input: Observations and time difference between observations (xt, dt)t=1..T h0 = 0 for t in 1, 2, ..., T do ht = GRUCell (ht 1, xt) {Update hidden state} ht = ODESolve fθ, ht, dt {Solve ODE} zt = MLP(ht) for all t = 1..T Return: {zt}t=1..T ; ht |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the methodology described in this paper. |
| Open Datasets | Yes | In regular observation domains, we consider conventional partially observable control and meta-RL tasks by employing Mu Jo Co [Todorov et al., 2012] and Py Bullet [Greff et al., 2022] environments. |
| Dataset Splits | No | The paper uses standard reinforcement learning environments (MuJoCo, PyBullet) for training and evaluation. It does not provide specific dataset splits for training, validation, and testing with percentages or sample counts, as is common for fixed datasets. |
| Hardware Specification | Yes | We train these methods on a server with NVIDIA TITAN Xp and Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz as GPU and CPU respectively. |
| Software Dependencies | No | The paper states 'We use the Py Torch framework for our experiments,' but it does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We use the Py Torch framework for our experiments. Some basic hyperparameters about the network architectures are listed below [...] Table 2: Hyperparameters [...] Table 3: Hyperparameters of SAC and TD3 |