reproducibilityindex.ai

CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

Authors: Hai X. Pham, Ricardo Guerrero, Vladimir Pavlovic, Jiatong Li2423-2430

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that by making use of efficient tree-structured Long Short-Term Memory as the text encoder in our computational cross-modal retrieval framework, we are not only able to identify the main ingredients and cooking actions in the recipe descriptions without explicit supervision, but we can also learn more meaningful feature representations of food recipes, appropriate for challenging cross-modal retrieval and recipe adaption tasks. and Experiments In this section we will use L, T, G, and S as shorthand for LSTM, Tree-LSTM, GRU and Set (Zaheer et al. 2017), respectively.
Researcher Affiliation	Collaboration	1 Samsung AI Center, Cambridge 2 Department of Computer Science, Rutgers University {hai.xuanpham, r.guerrero, v.pavlovic}@samsung.com, jl2312@rutgers.edu
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Source code of our proposed method is available at https://github.com/haixpham/CHEF.
Open Datasets	Yes	During the preparation of the work presented here, all experiments were conducted using data from Recipe1M (R1M) (Salvador et al. 2017; Mar ın et al. 2019).
Dataset Splits	Yes	Data is split into 70% train, 15% validation and 15% test sets.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies	No	The paper mentions various models and architectures like 'word2vec model' and 'ResNet50', but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We empirically set ϵ to 0.3 by cross-validation. and Finally, h is projected to the shared space by three fully connected (FC) layers, each of dimensionality 1024, to yield the latent text features p R1024. and Res Net50 (He et al. 2016) pre-trained on Image Net is used as the backbone for feature extraction, where the last FC layer is replaced with three consecutive FC layers (similar to the recipe encoder) to project the extracted features into the shared latent space to get q R1024.