Leaf-Smoothed Hierarchical Softmax for Ordinal Prediction

Authors: Wesley Tansey, Karl Pichotta, James Scott

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our approach empirically on a suite of real-world datasets, in some cases reducing the error by nearly half in comparison to other popular methods in the literature.
Researcher Affiliation Academia Wesley Tansey* Columbia University New York, NY 10027 wt2274@cumc.columbia.edu Karl Pichotta, James G. Scott University of Texas at Austin Austin, TX 78712 pichotta@cs.utexas.edu james.scott@mccombs.utexas.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper mentions using a third-party implementation: "We use the Wave Net implementation at https://github.com/ibab/tensorflow-wavenet.", but it does not state that the authors are providing open-source code for their own methodology.
Open Datasets Yes We evaluate LSHS on a series of benchmarks against real and synthetic data. ... We compile a benchmark of real-world datasets with discrete conditional distributions as a target output (or where the target variable is discrete). We use seven datasets from the UCI database; three are one-dimensional targets, three are two-dimensional, and one is three-dimensional. ... We also evaluate on a pixel prediction task for both MNIST and CIFAR-10... All trials were run on the VCTK corpus (Yamagishi 2012)...
Dataset Splits Yes Models are trained for 100K steps using Adam with learning rate 10 4, ϵ = 1, and batch size 50, reserving 20% of the train set for validation, with validation every 100 steps to save the best model and prevent overfitting. ... All results are averages using 10-fold cross-validation and we use 20% of the training data in each trial as a validation set.
Hardware Specification No The paper mentions general hardware terms like "vectorization on a GPU" and "GPU cache" but does not specify any particular GPU model, CPU type, or other specific hardware configurations used for the experiments.
Software Dependencies No The paper mentions using a "Wave Net implementation at https://github.com/ibab/tensorflow-wavenet" which implies TensorFlow, but it does not specify any version numbers for TensorFlow or any other software libraries or dependencies used in the experiments.
Experiment Setup Yes Models are trained for 100K steps using Adam with learning rate 10 4, ϵ = 1, and batch size 50, reserving 20% of the train set for validation, with validation every 100 steps to save the best model and prevent overfitting. For GMM and LMM, we evaluated over m {1, 3, 5, 10, 20}. For smoothed models, we fixed the neighborhood radius to 5 and evaluated at k {1, 2} and λ {1e-4,5e-4,1e-3,5e-3,1e-2,5e-2, 0.1, 0.5, 1.0}. Hyperparameters were set by validation performance. ... Models were trained with Adam with decaying learning rate with initial rate 10 1, minimum rate 10 4, and decay rate of 0.25, decaying the rate after the current model has failed to improve for 10 epochs. Training stops after 1000 epochs or if the current learning rate is below the minimum learning rate. ... for LSHS we used a radius of 5 and λ = 0.01 for all experiments; all other parameters were set to defaults.