Variational Recurrent Models for Solving Partially Observable Control Tasks

Authors: Dongqi Han, Kenji Doya, Jun Tani

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed algorithm was tested in two types of PO robotic control tasks, those in which either coordinates or velocities were not observable and those that require long-term memorization. Our experiments show that the proposed algorithm achieved better data efficiency and/or learned more optimal policy than other alternative approaches in tasks in which unobserved states cannot be inferred from raw observations in a simple manner.
Researcher Affiliation Academia Dongqi Han Cognitive Neurorobotics Research Unit Okinawa Institute of Science and Technology Okinawa, Japan dongqi.han@oist.jp Kenji Doya Neural Computation Unit Okinawa Institute of Science and Technology Okinawa, Japan doya@oist.jp Jun Tani Cognitive Neurorobotics Research Unit Okinawa Institute of Science and Technology Okinawa, Japan jun.tani@oist.jp
Pseudocode Yes Algorithm 1 Variational Recurrent Models with Soft Actor Critic
Open Source Code Yes Codes are available at https://github.com/oist-cnru/Variational-Recurrent-Models.
Open Datasets Yes For the robotic control tasks and the Pendulum task, we used environments (and modified them for PO versions) from Open AI Gym (Brockman et al., 2016). The Cart Pole environment with a continuous action space was from Danforth (2018), and the codes for the sequential target reaching tasks were provided by the authors (Han et al., 2019).
Dataset Splits No The paper describes training procedures, batch sizes, and update intervals but does not specify validation dataset splits or percentages.
Hardware Specification Yes The working environment was a desktop computer using Intel i7-6850K CPU and the task is Velocities-only Roboschool Hopper.
Software Dependencies No The paper mentions 'Adam' as an optimizer and environments like 'Open AI Gym' and 'Roboschool' but does not provide specific version numbers for software libraries, frameworks, or environments used.
Experiment Setup Yes Table 1 and Table 2 provide specific hyperparameters such as 'Discount factor 0.99', 'lr actor 0.0003', 'lr model 0.0008', 'seq len 64', and 'batch size 4'.