Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing

Authors: David Perera, Victor Letzelter, Theo Mariotte, Adrien Cortes, Mickael Chen, Slim Essid, Gaël Richard

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our algorithm by extensive experiments on synthetic datasets, on the standard UCI benchmark, and on speech separation. Additionally, we validate our algorithm by extensive experiments on synthetic datasets, on the standard UCI benchmark, and on speech separation.
Researcher Affiliation Collaboration 1 LTCI, Télécom Paris, Institut Polytechnique de Paris 2 Valeo.ai 3 Sorbonne Université
Pseudocode No The paper describes the training procedure iteratively in text (Section 3.2) but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes The accompanying code is made available.1 1https://github.com/Victorletzelter/annealed_mcl
Open Datasets Yes We validate our algorithm by extensive experiments on synthetic datasets, on the standard UCI benchmark, and on speech separation. Specifically, we used the official train-test splits, with 20 folds except for the Protein dataset, which is split into 5 folds, and the Year dataset, which uses a single fold. Source separation experiments are conducted on the Wall Street Journal dataset [30] (WSJ0-mix), a standard benchmark for speech separation.
Dataset Splits Yes We used the official train-test splits, with 20 folds except for the Protein dataset, which is split into 5 folds, and the Year dataset, which uses a single fold. Each version features 20000, 5000, and 3000 mixtures for training, validation, and testing respectively.
Hardware Specification Yes Separation models are trained on Nvidia A40 GPU cards.
Software Dependencies No The paper mentions using 'Adam optimizer' but does not specify its version or the versions of any other key software libraries or frameworks required for reproducibility.
Experiment Setup Yes Our method (a MCL), was trained with an exponential scheduler of the form T(t) = T0ρt, with ρ = 0.95 and T0 = 0.5. Both a MCL and Relaxed-MCL were trained for 1,000 epochs. Each MCL system was trained with n = 5 hypotheses. The batch size is set to 22. Each model is trained on tepoch = 200 epochs, without early stopping. Unless otherwise stated, the temperature scheduler is chunk linear: ... with initial temperature T0 0.1 and tmax = 100. The neural network weights are updated using the Adam optimizer, with a learning rate set to 10 3. The learning rate is halved after every 5 epochs with no improvement in the validation metric.