Variational Recurrent Models for Solving Partially Observable Control Tasks
Authors: Dongqi Han, Kenji Doya, Jun Tani
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed algorithm was tested in two types of PO robotic control tasks, those in which either coordinates or velocities were not observable and those that require long-term memorization. Our experiments show that the proposed algorithm achieved better data efficiency and/or learned more optimal policy than other alternative approaches in tasks in which unobserved states cannot be inferred from raw observations in a simple manner. |
| Researcher Affiliation | Academia | Dongqi Han Cognitive Neurorobotics Research Unit Okinawa Institute of Science and Technology Okinawa, Japan dongqi.han@oist.jp Kenji Doya Neural Computation Unit Okinawa Institute of Science and Technology Okinawa, Japan doya@oist.jp Jun Tani Cognitive Neurorobotics Research Unit Okinawa Institute of Science and Technology Okinawa, Japan jun.tani@oist.jp |
| Pseudocode | Yes | Algorithm 1 Variational Recurrent Models with Soft Actor Critic |
| Open Source Code | Yes | Codes are available at https://github.com/oist-cnru/Variational-Recurrent-Models. |
| Open Datasets | Yes | For the robotic control tasks and the Pendulum task, we used environments (and modified them for PO versions) from Open AI Gym (Brockman et al., 2016). The Cart Pole environment with a continuous action space was from Danforth (2018), and the codes for the sequential target reaching tasks were provided by the authors (Han et al., 2019). |
| Dataset Splits | No | The paper describes training procedures, batch sizes, and update intervals but does not specify validation dataset splits or percentages. |
| Hardware Specification | Yes | The working environment was a desktop computer using Intel i7-6850K CPU and the task is Velocities-only Roboschool Hopper. |
| Software Dependencies | No | The paper mentions 'Adam' as an optimizer and environments like 'Open AI Gym' and 'Roboschool' but does not provide specific version numbers for software libraries, frameworks, or environments used. |
| Experiment Setup | Yes | Table 1 and Table 2 provide specific hyperparameters such as 'Discount factor 0.99', 'lr actor 0.0003', 'lr model 0.0008', 'seq len 64', and 'batch size 4'. |