Learning Optimal Tree Models under Beam Search

Authors: Jingwei Zhuo, Ziru Xu, Wei Dai, Han Zhu, Han Li, Jian Xu, Kun Gai

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both synthetic and real data verify the rationality of our theoretical analysis and demonstrate the superiority of our algorithm compared to state-of-the-art methods.
Researcher Affiliation Industry Jingwei Zhuo 1 Ziru Xu 1 Wei Dai 1 Han Zhu 1 Han Li 1 Jian Xu 1 Kun Gai 1 1Alibaba Group. Correspondence to: Jingwei Zhuo <zjw169463@alibaba-inc.com>.
Pseudocode Yes Algorithm 1 Learning Optimal Tree Models under Beam Search
Open Source Code No The paper does not include an unambiguous statement or link indicating the release of open-source code for the described methodology.
Open Datasets Yes Our experiment are conducted on two large-scale real datasets for recommendation tasks: Amazon Books (Mc Auley et al., 2015; He & Mc Auley, 2016) and User Behavior (Zhu et al., 2018).
Dataset Splits Yes We discard the user based data which has less than 10 items and split the rest into training set Dtr, validation set Dval and testing set Dte in the same way as Zhu et al. (2018; 2019).
Hardware Specification Yes This number is averaged over 5000 training iterations on a single Tesla P100-PCIE-16GB GPU.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4).
Experiment Setup Yes More specifically, T is set to be a random binary tree over I and g(x, n) = θ n x+bn is parameterized as a linear scorer, where θn Rd and bn R are trainable parameters. The neural network consists of three fully connected layers with hidden size 128, 64 and 24 and parametric Re LU is used as the activation function. In experiment, we set b to be a negative value such that the number of non-zero entries is less than 0.1M to simulate the practical case where the number of relevant targets is much smaller than the target set size. Throughout experiments, we use OTM to denote the tree model trained according to Algorithm 1 since its goal is to learn optimal tree models under beam search. To perform an ablation study, we consider two variants of OTM: OTM (-BS) differs from OTM by replacing Bh(x; θt) with Sh(y) = S+ h (y) S S h (y), and OTM (-Opt Est) differs from OTM by replacing ˆzn(x; θt) in Eq. (13) with zn in Eq. (1).