reproducibilityindex.ai

Low-Rank Time-Frequency Synthesis

Authors: Cédric Févotte, Matthieu Kowalski

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We describe two expectation-maximization algorithms for estimation in the new model and report audio signal processing results with music decomposition and speech enhancement.
Researcher Affiliation	Academia	C edric F evotte Laboratoire Lagrange (CNRS, OCA & Universit e de Nice) Nice, France cfevotte@unice.fr Matthieu Kowalski Laboratoire des Signaux et Syst emes (CNRS, Sup elec & Universit e Paris-Sud) Gif-sur-Yvette, France kowalski@lss.supelec.fr
Pseudocode	Yes	E-step: z(i) = E{z\|x, λ(i)} = α(i) + β λ(i) Φ (x Φα(i)) (16) M-step: (f, n), α(i+1) fn = v(i) fn v(i) fn + β z(i) fn (17) (W(i+1), H(i+1)) = arg min W,H 0 fn DIS \|α(i+1) fn \|2\|[WH]fn (18) T x Φα(i+1) 2 F (19)
Open Source Code	No	The paper mentions "Sound examples are provided in the supplementary material." but does not state that the source code for the methodology is openly available or provide a link to it.
Open Datasets	Yes	The training data, with sampling rate 16k Hz, is extracted from the TIMIT database [12].
Dataset Splits	No	The paper mentions training and test data but does not explicitly describe a separate validation dataset split.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running the experiments.
Software Dependencies	No	The paper mentions "Large Time-Frequency Analysis Toolbox (LTFAT) [7]" but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	We use a 2048 samples-long ( 46 ms) Hann window for the tonal layer, and a 128 samples-long ( 3 ms) Hann window for the transient layer, both with a 50% time overlap. The number of latent components in the two layers is set to K = 3. The two t-f bases are Gabor frames with Hann window of length 512 samples ( 32 ms) for the tonal layer and 32 samples ( 2 ms) for the transient layer, both with 50% overlap. The hyperparameter λ is gradually decreased to a negligible value during iterations (resulting in a negligible residual e), a form of warm-restart strategy [13]. Wtrain tonal and Wtrain transient are ﬁxed pre-trained dictionaries of dimension K = 500, obtained from 30 min of training speech containing male and female speakers. The noise dictionaries Wnoise tonal and Wnoise transient are learnt from the noisy data, using K = 2.