Generalized Policy Iteration using Tensor Approximation for Hybrid Control
Authors: Suhan Shetty, Teng Xue, Sylvain Calinon
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the superiority of our approach over previous baselines for some benchmark problems with hybrid action spaces. Additionally, the robustness and generalization of the policy for hybrid systems are showcased through a real-world robotics experiment involving a non-prehensile manipulation task. |
| Researcher Affiliation | Academia | Suhan Shetty1,2, Teng Xue1,2, Sylvain Calinon1,2 1 Idiap Research Institute, Martigny, Switzerland 2 École Polytechnique Fédérale de Lausanne (EPFL), Switzerland |
| Pseudocode | Yes | Algorithm 1 VI Algorithm; Algorithm 2 TTPI: Generalized Policy Iteration using Tensor Train |
| Open Source Code | Yes | A Py Torch-based GPUaccelerated implementation of these algorithms is provided along with the supplementary material at https://sites.google.com/view/ttpi4control. |
| Open Datasets | Yes | We evaluated our algorithm on two benchmark problems involving systems with hybrid action spaces: the Catch-Point (CP) Problem and the Hard-Move (HM) problem, as proposed by Li et al. (2022). |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. It mentions discretizing variables and randomly selecting initialization points but not formal train/validation/test splits. |
| Hardware Specification | Yes | In our experiments, we utilized an NVIDIA Ge Force RTX 3090 GPU with 24GB of memory. |
| Software Dependencies | No | The paper mentions a "Py Torch-based GPUaccelerated implementation" but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | For the applications considered, we discretized each continuous variable with 100 points using uniform discretization. To approximate the value and advantage functions in TT format using TT-Cross, an accuracy of ϵ = 10 3 proved sufficient. We set rmax to a large value of 100. The discount factor was chosen in the range of 0.99 to 0.9999, depending on the time step t which ranged from 0.01 to 0.001. |