Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DeepHalo: A Neural Choice Model with Controllable Context Effects

Authors: Shuhan Zhang, Zhi Wang, Rui Gao, Shuang Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on synthetic and real-world datasets demonstrate strong predictive performance while providing greater transparency into the drivers of choice.
Researcher Affiliation	Academia	Shuhan Zhang The Chinese University of Hong Kong, Shenzhen EMAIL Zhi Wang University of Toronto EMAIL Rui Gao The University of Texas at Austin EMAIL Shuang Li The Chinese University of Hong Kong, Shenzhen EMAIL
Pseudocode	No	The paper describes mathematical formulations and neural network architectures (e.g., equations 4, 5, 9, 10), but does not include a distinct block explicitly labeled as 'Pseudocode' or 'Algorithm' with structured steps.
Open Source Code	Yes	Code available at: https://github.com/Asimov-Chuang/Deep Halo.
Open Datasets	Yes	We experiment on two real-world datasets of different scales: the smaller LPMC transportation dataset [Hillel et al., 2018] with 81,086 observations and the larger Expedia Hotel Choice dataset [Adam et al., 2013] with 275,609 transactions. The LPMC dataset, based on the London Travel Demand Survey, captures travel mode choices... The Expedia dataset contains hotel search and booking records. ... To investigate the necessity of modeling high-order context effects, we evaluate the empirical performance of our featureless model on three real-world datasets: Hotel [Bodea et al., 2009], SFOwork and SFOshop [Seshadri et al., 2019].
Dataset Splits	Yes	The Hotel dataset records bookings from five continental hotels. ... We use 1,845 observations for training and 465 for testing; due to the limited sample size, no separate validation split is held out. The SFOwork and SFOshop datasets contain travel mode choices in the San Francisco Bay Area... Both datasets are partitioned into training, validation, and test sets in an 8:1:1 ratio.
Hardware Specification	Yes	All experiments, including hypothetical data experiments, are conducted on a single Google Colab T4 GPU with Adam optimizer.
Software Dependencies	No	The paper mentions using the 'Adam optimizer' but does not specify the versions of any programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other relevant libraries with their version numbers.
Experiment Setup	Yes	The batch size and learning rate are fixed to be 1024 and 1 x 10^-4. All models are trained for 500 epochs. ... For the HOTEL dataset, ... we train the model using a full-batch setting for 300 epochs without early stopping. ... For experiments on the SFO datasets (SFOSHOP and SFOWORK), we tune the number of layers L ∈ {4, 5} and the intermediate width J ∈ {10, 20} based on validation performance. Early stopping is applied with a patience of 10 epochs to prevent overfitting. ... We tune the hyperparameters of Deep Halo based on validation performance. Specifically, we search the number of layers L ∈ {4, 5}, embedding dimensions d ∈ {32, 64}, and hidden dimensions H ∈ {8, 16}. For all configurations, we adopt early stopping with a patience of 10 epochs. ... training is first conducted with a learning rate of 10^-3, and then fine-tuned using a smaller rate of 10^-4.