Learning from the Tangram to Solve Mini Visual Tasks
Authors: Yizhou Zhao, Liang Qiu, Pan Lu, Feng Shi, Tian Han, Song-Chun Zhu3490-3498
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our proposed method generates intelligent solutions for aesthetic tasks such as folding clothes and evaluating room layouts. |
| Researcher Affiliation | Academia | Yizhou Zhao1, Liang Qiu1, Pan Lu1, Feng Shi1, Tian Han2, Song-Chun Zhu1 1UCLA Center for Vision, Cognition, Learning, and Autonomy 2 Stevens Institute of Technology yizhouzhao@g.ucla.edu |
| Pseudocode | No | The paper describes methods and formulas but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The Tangram dataset is available at https://github.com/yizhouzhao/Tangram. The paper explicitly states the link is for the dataset, not the code for the methodology. |
| Open Datasets | Yes | We introduce the Tangram, a new dataset consisting of more than 10, 000 snapshots... The Tangram dataset is available at https://github.com/yizhouzhao/Tangram. We also use Omniglot (Lake, Salakhutdinov, and Tenenbaum 2019), Multi-digit MNIST (Chen et al. 2018), Icons-50 (Hendrycks and Dietterich 2018), Flowers-17 and Flowers-102 (Nilsback and Zisserman 2008). |
| Dataset Splits | Yes | For each dataset, 80% of the samples are used for training and the remaining 20% for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like 'Unity game engine' and 'Glo Ve embedding' but does not provide specific version numbers for these or any other key software dependencies. |
| Experiment Setup | Yes | To train the functions fθ and gφ, we use a simple convolutional neural network with only four 3 3 convolutional layers. Each image is resized into 28 28. We apply the 50-dimension Glo Ve embedding... and we assign 80% of the weight on CCL and 20% on PML. For folding clothes, The size of the image I representing the state s is 28 28 and there are ten vertical and ten horizontal folding axes evenly distributed in the image. For icon classification, The inputs of the network are binary images of the size 224 224. |