Latent Autoregressive Source Separation
Authors: Emilian Postolache, Giorgio Mariani, Michele Mancusi, Andrea Santilli, Luca Cosmo, Emanuele Rodolà
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our method on images and audio with several sampling strategies (e.g., ancestral, beam search) showing competitive results with existing approaches in terms of separation quality while offering at the same time significant speedups in terms of inference time and scalability to higher dimensional data. We perform quantitative and qualitative experiments on various datasets to demonstrate the efficacy and scalability of LASS. In the image domain, we evaluate on MNIST (Lecun et al. 1998) and Celeb A (32 × 32) (Liu et al. 2015) and present qualitative results on the higher resolution datasets Celeb A-HQ (256 × 256) (Karras et al. 2018) and Image Net (256 × 256) (Deng et al. 2009). In the audio domain, we test on Slakh2100 (Manilow et al. 2019). |
| Researcher Affiliation | Academia | 1 Sapienza University of Rome, Italy 2 Ca Foscari University of Venice, Italy 3 University of Lugano, Switzerland postolache@di.uniroma1.it, mariani@di.uniroma1.it, mancusi@di.uniroma1.it |
| Pseudocode | Yes | Algorithm 1: LASS inference Input: y Output: x1, x2 |
| Open Source Code | Yes | Implementation details for all the models are listed on the companion website3. 3github.com/gladia-research-group/latent-autoregressive-source-separation |
| Open Datasets | Yes | In the image domain, we evaluate on MNIST (Lecun et al. 1998) and Celeb A (32 × 32) (Liu et al. 2015) and present qualitative results on the higher resolution datasets Celeb A-HQ (256 × 256) (Karras et al. 2018) and Image Net (256 × 256) (Deng et al. 2009). In the audio domain, we test on Slakh2100 (Manilow et al. 2019), a large dataset for music source separation suitable for generative modeling. |
| Dataset Splits | Yes | In order to choose the best sampler for this dataset, we validate the set of samplers in Table 3 on 1,000 mixtures constructed from the test split. The validation dataset is constructed similarly (with different music chunks). |
| Hardware Specification | Yes | We conducted all our experiments on a single Nvidia RTX 3090 GPU with 24 GB of VRAM. |
| Software Dependencies | No | The paper mentions common frameworks like "Transformer architecture" and implicitly deep learning libraries, but it does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, etc.). It mentions "Implementation details for all the models are listed on the companion website" but these details are not present in the paper itself. |
| Experiment Setup | Yes | We use K = 256 codes on MNIST and K = 512 on Celeb A... On Celeb A-HQ the VQ-GAN has K = 1024 codes, while on Image Net has K = 16384. We scale the likelihood term by multiplying it by λ = 3. For each mixture in the test set we sample a candidate batch of 512 separations, select the separation whose sum better matches the mixture (w.r.t. the L2 distance), and finally perform the refinement procedure in Eqs. (5), (6) with T = 500 and α = 0.1. As a sampling strategy, we use beam search since it shows the best results on a validation of 50 mixtures (Table 3), using B = 100 beams. |