Learning Optimal Tree Models under Beam Search
Authors: Jingwei Zhuo, Ziru Xu, Wei Dai, Han Zhu, Han Li, Jian Xu, Kun Gai
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both synthetic and real data verify the rationality of our theoretical analysis and demonstrate the superiority of our algorithm compared to state-of-the-art methods. |
| Researcher Affiliation | Industry | Jingwei Zhuo 1 Ziru Xu 1 Wei Dai 1 Han Zhu 1 Han Li 1 Jian Xu 1 Kun Gai 1 1Alibaba Group. Correspondence to: Jingwei Zhuo <zjw169463@alibaba-inc.com>. |
| Pseudocode | Yes | Algorithm 1 Learning Optimal Tree Models under Beam Search |
| Open Source Code | No | The paper does not include an unambiguous statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | Our experiment are conducted on two large-scale real datasets for recommendation tasks: Amazon Books (Mc Auley et al., 2015; He & Mc Auley, 2016) and User Behavior (Zhu et al., 2018). |
| Dataset Splits | Yes | We discard the user based data which has less than 10 items and split the rest into training set Dtr, validation set Dval and testing set Dte in the same way as Zhu et al. (2018; 2019). |
| Hardware Specification | Yes | This number is averaged over 5000 training iterations on a single Tesla P100-PCIE-16GB GPU. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4). |
| Experiment Setup | Yes | More specifically, T is set to be a random binary tree over I and g(x, n) = θ n x+bn is parameterized as a linear scorer, where θn Rd and bn R are trainable parameters. The neural network consists of three fully connected layers with hidden size 128, 64 and 24 and parametric Re LU is used as the activation function. In experiment, we set b to be a negative value such that the number of non-zero entries is less than 0.1M to simulate the practical case where the number of relevant targets is much smaller than the target set size. Throughout experiments, we use OTM to denote the tree model trained according to Algorithm 1 since its goal is to learn optimal tree models under beam search. To perform an ablation study, we consider two variants of OTM: OTM (-BS) differs from OTM by replacing Bh(x; θt) with Sh(y) = S+ h (y) S S h (y), and OTM (-Opt Est) differs from OTM by replacing ˆzn(x; θt) in Eq. (13) with zn in Eq. (1). |