PCMC-Net: Feature-based Pairwise Choice Markov Chains

Authors: Alix Lhéritier

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show our network significantly outperforming, in terms of prediction accuracy and logarithmic loss, feature engineered standard and latent class Multinomial Logit models as well as recent machine learning approaches.
Researcher Affiliation Industry Alix Lhéritier Amadeus SAS F-06902 Sophia-Antipolis, France alix.lheritier@amadeus.com
Pseudocode No The paper includes an architecture diagram in Figure 1, but no explicit pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/alherit/PCMC-Net.
Open Datasets Yes We used the dataset from Mottini & Acuna-Agost (2017) consisting of flight bookings sessions on a set of European origins and destinations.
Dataset Splits Yes Early stopping is performed during training if no significant improvement (greater than 0.01 with respect to the best log loss obtained so far) is made on a validation set (a random sample consisting of 10% of the choice sessions from the training set) during 5 epochs.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, or cloud computing instances with their specifications) used for running its experiments.
Software Dependencies No PCMC-Net was implemented in Py Torch (Paszke et al., 2017). Stochastic gradient optimization is performed with Adam (Kingma & Ba, 2015).
Experiment Setup Yes We instantiate PCMC-Net with an identity representation layer with da = 4 and d0 = 0 and a transition rate layer with h {1, 2, 3} hidden layers of ν = 16 nodes with Leaky Re LU activation (slope = 0.01) and ϵ = 0.5. We trained it with the Adam optimizer with one choice set per iteration, a learning rate of 0.001 and no dropout, for 100 epochs. Table 2: Hyperparameters optimized with Bayesian optimization. parameter range best value learning rate {10 i}i=1...6 0.001 batch size (in sessions) {2i}i=0...4 16 hidden layers in fwq {1, 2, 3} 2 nodes per layer in fwq {2i}i=5...9 512 activation {Re LU, Sigmoid, Tanh, Leaky Re LU} Leaky Re LU