Interactive Object Placement with Reinforcement Learning
Authors: Shengping Zhang, Quanling Meng, Qinglin Liu, Liqiang Nie, Bineng Zhong, Xiaopeng Fan, Rongrong Ji
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the OPA dataset demonstrate that the proposed method achieves state-of-the-art performance in terms of plausibility and diversity. |
| Researcher Affiliation | Academia | 1School of Computer Science and Technology, Harbin Institute of Technology, Weihai, China 2School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China 3Guangxi Key Laboratory of Multi-Source Information Mining and Security, Guangxi Normal University, Guilin, China 4School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China 5Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen, China. |
| Pseudocode | Yes | Algorithm 1 Training procedure of IOPRE |
| Open Source Code | No | The paper does not include a statement about releasing the source code or a link to a repository for the described methodology. |
| Open Datasets | Yes | We conduct all experiments on the Object Placement Assessment (OPA) dataset (Liu et al., 2021a) |
| Dataset Splits | No | Specifically, these composite images are divided into 62,074 training images and 11,396 test images. The paper explicitly provides train and test splits, but no separate validation split. |
| Hardware Specification | No | No specific hardware (GPU, CPU models, etc.) used for running experiments is mentioned in the paper. |
| Software Dependencies | No | The paper mentions software components like Swin Transformer and Adam W, but no specific version numbers for any software dependencies are provided. |
| Experiment Setup | Yes | We set the learning rate as 2e-4 and use Adam W (Loshchilov & Hutter, 2017) to train IOPRE with batch size 64 for 15 epochs. For the assessment reward, we set the hyper-parameter λs as 0.01. To train IOPRE, we set the discount factor γ as 0.99, the weight of entropy loss β as 0.08, and the maximum number of steps tmax as 20. In the training phase, t0 is obtained by a random initialization. In the test phase, the maximum number of steps is set as 100. All images are resized to 256 256 before being fed into all networks. |