reproducibilityindex.ai

Hi-Fi Ark: Deep User Representation via High-Fidelity Archive Network

Authors: Zheng Liu, Yu Xing, Fangzhao Wu, Mingxiao An, Xing Xie

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical studies are performed with three real-world datasets on news recommendation, online advertising, and e-commerce, respectively. It is demonstrated that Hi-Fi Ark outperforms the well-recognized baseline methods remarkably and consistently, thereby verifying the effectiveness of our proposed methods.
Researcher Affiliation	Collaboration	Zheng Liu1 , Yu Xing2 , Fangzhao Wu1 , Mingxiao An2 , Xing Xie1 1Microsoft Research Asia, Beijing, China 2University of Science and Technology China, Hefei, China
Pseudocode	Yes	Algorithm 1: End-to-End CTR Prediction
Open Source Code	Yes	Hi-Fi Ark s Implementation is available at https://github.com /xyyimian/Hiﬁ-Ark/
Open Datasets	Yes	News. A total of 5-week news clicks are provided by MSN News from the EN-US market3. The dataset includes: 1) click relationships between users and news, 2) news titles, 3) news categories (e.g, sports, ﬁnance). Samples within the ﬁrst four weeks are used for training, while those in the last week are used for testing. Titles are used as raw features, with 1D CNN employed as text encoder. The text encoder is pretrained for the classiﬁcation of news category. Ads. A total of one-week Ads clicks are offered by Bing Ads [Parsana et al.2018], which includes: 1) click relationships between users and URLs, 2) titles of URLs. The titles have already been mapped into 300-dimension vectors with a well pretrained DSSM model [Huang et al.2013] over massive corpus. E-Commerce. This dataset4 contains users shopping behaviors on Amazon (the ratings-only dataset). All the purchased items are treated as positive cases, while negative cases are randomly sampled from the non-purchased ones. In contrast to other dataset, items in this dataset are purely represented by an unique ID, which is to be vectorized via a cold-started embedding matrix. As a result, we are able to evaluate the boundary case where recommendations have to be made with highly limited features.
Dataset Splits	No	Samples within the ﬁrst four weeks are used for training, while those in the last week are used for testing. The paper does not provide explicit validation set splits.
Hardware Specification	No	The paper does not provide any specific hardware details used for running the experiments.
Software Dependencies	No	The paper mentions models and encoders like '1D CNN' and 'DSSM model', but it does not specify any software names with version numbers for reproducibility (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	No	The paper describes general aspects of the experiment setup, such as model components for different datasets and the CTR calculation method. However, it lacks specific numerical hyperparameters like learning rates, batch sizes, epochs, or optimizer details required for full reproducibility.