Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Variational Recurrent Models for Solving Partially Observable Control Tasks
Authors: Dongqi Han, Kenji Doya, Jun Tani
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed algorithm was tested in two types of PO robotic control tasks, those in which either coordinates or velocities were not observable and those that require long-term memorization. Our experiments show that the proposed algorithm achieved better data efficiency and/or learned more optimal policy than other alternative approaches in tasks in which unobserved states cannot be inferred from raw observations in a simple manner. |
| Researcher Affiliation | Academia | Dongqi Han Cognitive Neurorobotics Research Unit Okinawa Institute of Science and Technology Okinawa, Japan EMAIL Kenji Doya Neural Computation Unit Okinawa Institute of Science and Technology Okinawa, Japan EMAIL Jun Tani Cognitive Neurorobotics Research Unit Okinawa Institute of Science and Technology Okinawa, Japan EMAIL |
| Pseudocode | Yes | Algorithm 1 Variational Recurrent Models with Soft Actor Critic |
| Open Source Code | Yes | Codes are available at https://github.com/oist-cnru/Variational-Recurrent-Models. |
| Open Datasets | Yes | For the robotic control tasks and the Pendulum task, we used environments (and modified them for PO versions) from Open AI Gym (Brockman et al., 2016). The Cart Pole environment with a continuous action space was from Danforth (2018), and the codes for the sequential target reaching tasks were provided by the authors (Han et al., 2019). |
| Dataset Splits | No | The paper describes training procedures, batch sizes, and update intervals but does not specify validation dataset splits or percentages. |
| Hardware Specification | Yes | The working environment was a desktop computer using Intel i7-6850K CPU and the task is Velocities-only Roboschool Hopper. |
| Software Dependencies | No | The paper mentions 'Adam' as an optimizer and environments like 'Open AI Gym' and 'Roboschool' but does not provide specific version numbers for software libraries, frameworks, or environments used. |
| Experiment Setup | Yes | Table 1 and Table 2 provide specific hyperparameters such as 'Discount factor 0.99', 'lr actor 0.0003', 'lr model 0.0008', 'seq len 64', and 'batch size 4'. |