reproducibilityindex.ai

Generalized Policy Iteration using Tensor Approximation for Hybrid Control

Authors: Suhan Shetty, Teng Xue, Sylvain Calinon

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the superiority of our approach over previous baselines for some benchmark problems with hybrid action spaces. Additionally, the robustness and generalization of the policy for hybrid systems are showcased through a real-world robotics experiment involving a non-prehensile manipulation task.
Researcher Affiliation	Academia	Suhan Shetty1,2, Teng Xue1,2, Sylvain Calinon1,2 1 Idiap Research Institute, Martigny, Switzerland 2 École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
Pseudocode	Yes	Algorithm 1 VI Algorithm; Algorithm 2 TTPI: Generalized Policy Iteration using Tensor Train
Open Source Code	Yes	A Py Torch-based GPUaccelerated implementation of these algorithms is provided along with the supplementary material at https://sites.google.com/view/ttpi4control.
Open Datasets	Yes	We evaluated our algorithm on two benchmark problems involving systems with hybrid action spaces: the Catch-Point (CP) Problem and the Hard-Move (HM) problem, as proposed by Li et al. (2022).
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. It mentions discretizing variables and randomly selecting initialization points but not formal train/validation/test splits.
Hardware Specification	Yes	In our experiments, we utilized an NVIDIA Ge Force RTX 3090 GPU with 24GB of memory.
Software Dependencies	No	The paper mentions a "Py Torch-based GPUaccelerated implementation" but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	For the applications considered, we discretized each continuous variable with 100 points using uniform discretization. To approximate the value and advantage functions in TT format using TT-Cross, an accuracy of ϵ = 10 3 proved sufficient. We set rmax to a large value of 100. The discount factor was chosen in the range of 0.99 to 0.9999, depending on the time step t which ranged from 0.01 to 0.001.