Low-Rank Time-Frequency Synthesis
Authors: Cédric Févotte, Matthieu Kowalski
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We describe two expectation-maximization algorithms for estimation in the new model and report audio signal processing results with music decomposition and speech enhancement. |
| Researcher Affiliation | Academia | C edric F evotte Laboratoire Lagrange (CNRS, OCA & Universit e de Nice) Nice, France cfevotte@unice.fr Matthieu Kowalski Laboratoire des Signaux et Syst emes (CNRS, Sup elec & Universit e Paris-Sud) Gif-sur-Yvette, France kowalski@lss.supelec.fr |
| Pseudocode | Yes | E-step: z(i) = E{z|x, λ(i)} = α(i) + β λ(i) Φ (x Φα(i)) (16) M-step: (f, n), α(i+1) fn = v(i) fn v(i) fn + β z(i) fn (17) (W(i+1), H(i+1)) = arg min W,H 0 fn DIS |α(i+1) fn |2|[WH]fn (18) T x Φα(i+1) 2 F (19) |
| Open Source Code | No | The paper mentions "Sound examples are provided in the supplementary material." but does not state that the source code for the methodology is openly available or provide a link to it. |
| Open Datasets | Yes | The training data, with sampling rate 16k Hz, is extracted from the TIMIT database [12]. |
| Dataset Splits | No | The paper mentions training and test data but does not explicitly describe a separate validation dataset split. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions "Large Time-Frequency Analysis Toolbox (LTFAT) [7]" but does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | We use a 2048 samples-long ( 46 ms) Hann window for the tonal layer, and a 128 samples-long ( 3 ms) Hann window for the transient layer, both with a 50% time overlap. The number of latent components in the two layers is set to K = 3. The two t-f bases are Gabor frames with Hann window of length 512 samples ( 32 ms) for the tonal layer and 32 samples ( 2 ms) for the transient layer, both with 50% overlap. The hyperparameter λ is gradually decreased to a negligible value during iterations (resulting in a negligible residual e), a form of warm-restart strategy [13]. Wtrain tonal and Wtrain transient are fixed pre-trained dictionaries of dimension K = 500, obtained from 30 min of training speech containing male and female speakers. The noise dictionaries Wnoise tonal and Wnoise transient are learnt from the noisy data, using K = 2. |