Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Towards Hands-Free Visual Dialog Interactive Recommendation
Authors: Tong Yu, Yilin Shen, Hongxia Jin1137-1144
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical results show that the probability of finding the desired items by our system is about 3 times as high as that by the traditional interactive recommenders, after a few user interactions. Experiments Dataset and Online Evaluation We evaluate different approaches on the footwear dataset (Berg, Berg, and Shih 2010; Guo et al. 2018). |
| Researcher Affiliation | Industry | Tong Yu, Yilin Shen, Hongxia Jin Samsung Research America Mountain View, CA, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 presents our algorithm in a more general case. Algorithm 1 SPR bandit Input: λ, L, K, K , d 1 τ = 1, τ = 1, θ0 = 0 Rd 1, S0 = λ 1Id Rd d, xcenter = 0 Rd 1, B = [L] 2 forall t = 1, , n do 3 Sample the model parameters θt N( θt 1, St 1) 4 forall k = 1, , K do 5 at k arg maxe B {at 1, ,at k 1} x e θt 7 Recommend items At (at 1, , at K) |
| Open Source Code | No | The authors of (Guo et al. 2018) release the captioner codes in Github: https://github.com/Xiaoxiao Guo/fashion-retrieval. This link is for a third-party tool (captioner) used in their evaluation, not for the core methodology developed in this paper. |
| Open Datasets | Yes | We evaluate different approaches on the footwear dataset (Berg, Berg, and Shih 2010; Guo et al. 2018). |
| Dataset Splits | No | Similar to (Guo et al. 2018), we train the item identifier and visual dialog encoder on 10, 000 images, and evaluate our recommender in the online setting on another dataset with 4,658 images. While it mentions training on one dataset and evaluating on another, it does not provide explicit train/validation/test splits for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud computing specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions neural network architectures like ResNet101, CNN, and GRU, but does not provide specific version numbers for any software libraries or frameworks used (e.g., TensorFlow, PyTorch, Python version). |
| Experiment Setup | Yes | The inputs are the hyper-parameter λ of the Gaussian distribution, the total number of items L, the size of list K, a hyperparameter K and the dimensionality of the image feature vector d. The size of the list is K = 10. We show the results up to n = 100 steps. Similar to (Guo et al. 2018), we train the item identifier and visual dialog encoder on 10, 000 images, and evaluate our recommender in the online setting on another dataset with 4,658 images. |