Generalized Policy Iteration using Tensor Approximation for Hybrid Control

Authors: Suhan Shetty, Teng Xue, Sylvain Calinon

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the superiority of our approach over previous baselines for some benchmark problems with hybrid action spaces. Additionally, the robustness and generalization of the policy for hybrid systems are showcased through a real-world robotics experiment involving a non-prehensile manipulation task.
Researcher Affiliation Academia Suhan Shetty1,2, Teng Xue1,2, Sylvain Calinon1,2 1 Idiap Research Institute, Martigny, Switzerland 2 École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
Pseudocode Yes Algorithm 1 VI Algorithm; Algorithm 2 TTPI: Generalized Policy Iteration using Tensor Train
Open Source Code Yes A Py Torch-based GPUaccelerated implementation of these algorithms is provided along with the supplementary material at https://sites.google.com/view/ttpi4control.
Open Datasets Yes We evaluated our algorithm on two benchmark problems involving systems with hybrid action spaces: the Catch-Point (CP) Problem and the Hard-Move (HM) problem, as proposed by Li et al. (2022).
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. It mentions discretizing variables and randomly selecting initialization points but not formal train/validation/test splits.
Hardware Specification Yes In our experiments, we utilized an NVIDIA Ge Force RTX 3090 GPU with 24GB of memory.
Software Dependencies No The paper mentions a "Py Torch-based GPUaccelerated implementation" but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes For the applications considered, we discretized each continuous variable with 100 points using uniform discretization. To approximate the value and advantage functions in TT format using TT-Cross, an accuracy of ϵ = 10 3 proved sufficient. We set rmax to a large value of 100. The discount factor was chosen in the range of 0.99 to 0.9999, depending on the time step t which ranged from 0.01 to 0.001.