CAISE: Conversational Agent for Image Search and Editing
Authors: Hyounghun Kim, Doo Soon Kim, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Mohit Bansal10903-10911
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a novel generator-extractor model as a strong starting point baseline for this task and dataset. We employ a copying mechanism... Our experiments show our baseline model performs effectively as a starting point, and we demonstrate a large human-machine performance gap for useful future work. ... We split the total 1,611 dialogues into 1,052, 262, and 297 for train, validation, and test set, respectively. ... We use accuracy as the evaluation metric. |
| Researcher Affiliation | Collaboration | Hyounghun Kim,1 Doo Soon Kim,2 Seunghyun Yoon,3 Franck Dernoncourt,3 Trung Bui,3 Mohit Bansal1 1UNC Chapel Hill 2Roku Inc. 3Adobe Research {hyounghk, mbansal}@cs.unc.edu {syoon, dernonco, bui}@adobe.com |
| Pseudocode | No | The paper includes diagrams of the model architecture (Figure 3) and descriptions of commands, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Data and code are available: https://github.com/hyounghk/CAISE. |
| Open Datasets | Yes | Thus, we propose a dataset of an automated Conversational Agent for Image Search and Editing (CAISE). To our knowledge, this is the first dataset that provides conversational image search and editing annotations... Data and code are available: https://github.com/hyounghk/CAISE. |
| Dataset Splits | Yes | We split the total 1,611 dialogues into 1,052, 262, and 297 for train, validation, and test set, respectively. From the dialogue splits, we obtain 4,059/1,002/1,112 (train/valid/test) instance splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions some tools used during data collection (Adobe Stock, Adobe Photoshop, OpenCV) and model components (Faster RCNN, Adam) but does not provide specific version numbers for any software dependencies required to replicate the experiments. |
| Experiment Setup | Yes | We use 512 as the hidden size and 256 as the word embedding dimension. We use Adam (Kingma and Ba 2015) as the optimizer with the learning rate 1 10 4. |