Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Game of Sketches: Deep Recurrent Models of Pictionary-Style Word Guessing
Authors: Ravi Kiran Sarvadevabhatla, Shiv Surya, Trisha Mittal, R. Venkatesh Babu
AAAI 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on the large-scale guess-word dataset generated via Sketch-QA task and compare with various baselines. We also conduct a Visual Turing Test to obtain human impressions of the guess-words generated by humans and our model. Experimental results demonstrate the promise of our approach for Pictionary and similarly themed games. |
| Researcher Affiliation | Academia | Ravi Kiran Sarvadevabhatla, Shiv Surya, Trisha Mittal, R. Venkatesh Babu Video Analytics Lab, Indian Institute of Science, Bangalore 560012, INDIA |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Please visit our project page http://val.cds.iisc.ac.in/sketchguess for supplementary material, code and dataset related to our work. |
| Open Datasets | Yes | Via Sketch-QA, we create a new crowdsourced dataset of paired guess-word and sketch-strokes, dubbed WORDGUESS-160, collected from 16,624 guess sequences of 1,108 subjects across 160 sketch object categories. Please visit our project page http://val.cds.iisc.ac.in/sketchguess for supplementary material, code and dataset related to our work. |
| Dataset Splits | Yes | For model evaluation, we split the 16,624 sequences in GUESSWORD-160 randomly into disjoint sets containing 60% , 25% and 15% of the data which are used during training, validation and testing phases respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions several software components like 'Hun Pos tagger', 'Enchant spell check library', 'word2vec', 'VGG-16', 'LSTM', and 'Adagrad optimizer', but it does not specify their version numbers. |
| Experiment Setup | Yes | For all the experiments, we use Adagrad optimizer (Duchi, Hazan, and Singer 2011) with a starting learning rate of 0.01 and early-stopping as the criterion for terminating optimization. The value for margin is set to 0.1. Overall, we found the convex combination loss with λ = 1 (determined via grid search) to provide the best performance. LSTM with 512 hidden units as the RNN component. |