Expressing an Image Stream with a Sequence of Natural Sentences
Authors: Cesc C. Park, Gunhee Kim
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach with the blog datasets of the NYC and Disneyland, consisting of more than 20K blog posts with 140K associated images. Although we focus on the tourism topics in our experiments, our approach is completely unsupervised and thus applicable to any domain that has a large set of blog posts with images. We demonstrate the superior performance of our approach by comparing with other state-of-the-art alternatives, including [9, 12, 21]. We evaluate with quantitative measures (e.g. BLEU and Top-K recall) and user studies via Amazon Mechanical Turk (AMT). |
| Researcher Affiliation | Academia | Cesc Chunseong Park Gunhee Kim Seoul National University, Seoul, Korea {park.chunseong,gunhee}@snu.ac.kr |
| Pseudocode | No | The paper describes the model architecture and mathematical formulations, but it does not include any distinct pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/cesc-park/CRCN |
| Open Datasets | No | We collect blog datasets of the two topics: NYC and Disneyland. We reuse the blog data of Disneyland from the dataset of [11], and newly collect the data of NYC, using the same crawling method with [11], in which we first crawl blog posts and their associated pictures from two popular blog publishing sites, BLOGSPOT and WORDPRESS by changing query terms from Google search. Then, we manually select the travelogue posts that describe stories and events with multiple images. Finally, the dataset includes 11,863 unique blog posts and 78,467 images for NYC and 7,717 blog posts and 60,545 images for Disneyland. |
| Dataset Splits | Yes | For quantitative evaluation, we randomly split our dataset into 80% as a training set, 10% as a validation, and the others as a test set. |
| Hardware Specification | No | The paper does not specify any hardware details like GPU models, CPU types, or memory used for experiments. |
| Software Dependencies | No | The paper mentions software like NLTK, gensim doc2vec code, and Stanford Core NLP library, but does not provide specific version numbers for these or other key software dependencies. |
| Experiment Setup | Yes | We choose κ = 3 after thorough empirical tests. We assign 0.5 and 0.7 dropout rates to the two layers. In our experiments K = 5 is successful. We set M = 50. We apply the stochastic gradient descent (SGD) with mini-batches of 100 data streams. Among many SGD techniques, we select RMSprop optimizer [28]. We initialize the weights of our CRCN model using the method of He et al. [7]. |