Task Planning for Visual Room Rearrangement under Partial Observability

Authors: Karan Mirakhor, Sourav Ghosh, Dipanjan Das, Brojeshwar Bhowmick

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our method significantly outperforms the state-of-the-art rearrangement methods Weihs et al. (2021); Gadre et al. (2022); Sarch et al. (2022); Ghosh et al. (2022). (Abstract) In this section, we describe the datasets, metrics, and detailed results of our proposed method and its modules, in addressing the roomrearrangement problem. (Section 3)
Researcher Affiliation Industry Karan Mirakhor , Sourav Ghosh , Dipanjan Das & Brojeshwar Bhowmick Visual Computing and Embodied Intelligence Lab TCS Research, Kolkata, India {karan.mirakhor, g.sourav10, dipanjan.da, b.bhowmick}@tcs.com
Pseudocode Yes Algorithm 1: Training Proxy Reward Network (Section 2.4.2) Algorithm 2: Task planner (Appendix E)
Open Source Code No The paper states, "We plan to openly release the dataset to enable further research in this domain" (Contribution 7), which refers to the dataset, not the source code for the methodology. No explicit statement or link is provided for the code.
Open Datasets Yes Search Network Dataset : The AMT dataset in Kant et al. (2022) contains 268 object categories in 12 different rooms and 32 receptacle types. (Section 3.1)
Dataset Splits Yes The entire dataset is split into train, val, and test sets with a ratio of 55 : 15 : 30. (Appendix H.1)
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instance types) used for running its experiments.
Software Dependencies No We use Py Torch to train our models. (Appendix H.3) The paper mentions software like PyTorch and RoBERTa, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The training details of our Search network, Graph-based state Representation Network, Deep RL planner, and proxy reward network are available in the Appendix. (Section 3.3) Appendix G and H detail various aspects of the experimental setup, including network architectures (e.g., FC layers, ReLU, softmax, dropout rates like 0.2, 0.25, 0.5), learning rates (e.g., αSR = 0.001, αSC = 0.001, αGRN = 0.01), optimizers (Adam), batch sizes (512), replay buffer size (1000000000), polyak averaging rates (0.0075 and 0.00085 for target networks), and epsilon for ε-greedy exploration (ϵ = 0.025).