reproducibilityindex.ai

Winner-takes-all learners are geometry-aware conditional density estimators

Authors: Victor Letzelter, David Perera, Cédric Rommel, Mathieu Fontaine, Slim Essid, Gaël Richard, Patrick Perez

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically substantiate our estimator through experiments on both synthetic and real-world data, including audio data.1
Researcher Affiliation	Collaboration	1Valeo.ai, Paris, France 2LTCI, T el ecom Paris, Institut Polytechnique de Paris, France 3Meta AI, Paris, France 4Kyutai, Paris, France.
Pseudocode	No	The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	1Code at https://github.com/Victorletzelter/Voronoi WTA.
Open Datasets	Yes	UCI Regression datasets (Dua & Graff, 2017) are a standard benchmark (Hern andez-Lobato & Adams, 2015) to evaluate conditional density estimators.
Dataset Splits	Yes	The models were trained until convergence of the training loss, using early stopping on the validation loss. Each of the synthetic datasets consists of 100, 000 training points, and 25, 000 validation points. Post-training, the scaling factor h was tuned based on the average NLL over the validation set (20 % of the training data) using a golden section search (Kiefer, 1953).
Hardware Specification	Yes	The training of our neural networks was conducted on NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions software like 'Python programming language', 'Pytorch (Paszke et al., 2019) deep learning framework', 'Adam W optimizer (Loshchilov & Hutter, 2018)', and 'Hydra and MLFlow libraries'. However, it does not provide specific version numbers for these software components (e.g., PyTorch 1.9, Python 3.8), which is required for reproducibility.
Experiment Setup	Yes	In each training setup with synthetic data, we used a three-layer MLP, with 256 hidden units. The Adam optimizer (Kingma & Ba, 2014) was used with a constant learning rate of 0.001 in each setup. The models were trained until convergence of the training loss, using early stopping to select the checkpoint for which the validation loss was the lowest. Each of the models was trained for 100 epochs, with a batch size of 1024. We utilized Seld Net (Adavanne et al., 2018a) as backbone (with 1.6 M parameters). The Adam W optimizer (Loshchilov & Hutter, 2018) was used, with a batch size of 32, an initial learning rate of 0.05, and following the scheduling scheme from Vaswani et al. (2017). The WTA model was trained using the multi-target version of the Winner-takes-all loss (Equation 2 and 5 of Letzelter et al. (2023)), using confidence weight β = 1. The underlying loss ℓused was the spherical distance ℓ(ˆy, y) = arccos[ˆy y].