Human Synthesis and Scene Compositing
Authors: Mihai Zanfir, Elisabeta Oneata, Alin-Ionut Popa, Andrei Zanfir, Cristian Sminchisescu12749-12756
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The performance of our framework is supported by both qualitative and quantitative results, in particular state-of-the art synthesis scores for the Deep Fashion dataset. Experimental Results Evaluation metrics. We use several metrics to test the quality of our generated images: the Learned Perceptual Image Patch Similarity metric (LPIPS) (Zhang et al. 2018), the Inception Score (IS) (Salimans et al. 2016) and the Structural Similarity Index (SSIM) (Wang et al. 2004). |
| Researcher Affiliation | Collaboration | 1Google Research 2Department of Mathematics, Faculty of Engineering, Lund University 3Institute of Mathematics of the Romanian Academy |
| Pseudocode | No | The paper does not include any explicitly labeled 'Pseudocode' or 'Algorithm' sections, nor does it present structured steps formatted like code blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | Human Synthesis Datasets. We train our model on the Deep Fashion (Inshop Clothes Retrieval Benchmark) (Liu et al. 2016) dataset, which contains 52, 712 in-shop clothes images and around 200, 000 cross-pose/scale pairs. Datasets. For training the Appearance Compositing network, we use the COCO (Lin et al. 2014) dataset due to its large variety of natural images, containing both humans and associated ground truth segmentation masks. For the Geometric Compositing pipeline, we sample various backgrounds from the NYU Depth Dataset V2 (Nathan Silberman and Fergus 2012). |
| Dataset Splits | Yes | We use the train/test split provided by (Siarohin et al. 2018), containing 101, 268 train and 8, 616 test pairs of the same person in two different poses. For training the Appearance Compositing network, we use the COCO (Lin et al. 2014) dataset due to its large variety of natural images, containing both humans and associated ground truth segmentation masks. Table 2 shows the performance of our Appearance Compositing network on images from the COCO validation dataset under the LPIPS metric. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It does not contain phrases like 'We trained our models using NVIDIA A100' or similar. |
| Software Dependencies | No | The paper mentions software components and models such as 'conditional GAN', 'SMPL model', and 'VGG network' for perceptual loss, but it does not provide specific version numbers for any of these software dependencies, programming languages, or libraries. |
| Experiment Setup | No | The paper describes the loss function components and their combination (L = LGAN + λLL1 + γLV GG) and mentions using a 'multiscale discriminator as in (Wang et al. 2018)'. However, it does not explicitly provide concrete hyperparameter values such as specific learning rates, batch sizes, number of epochs, or optimizer configurations, which are essential for reproducing the experimental setup. |