Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
Authors: Yingying Fan, Yu Wu, Bo Du, Yutian Lin
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our simple yet effective approach outperforms state-of-the-art methods by a large margin. Code and data are available at https://github.com/fyyCS/LSLD. ... Section 4 Experiments |
| Researcher Affiliation | Academia | Yingying Fan, Yu Wu, Bo Du, Yutian Lin School of Computer Science, Hubei Luojia Laboratory, Wuhan University {fanyingying_cs, wuyucs, dubo, yutian.lin}@whu.edu.cn |
| Pseudocode | No | The paper includes 'Figure 1: Algorithm Overview' which is a block diagram, but it does not present structured pseudocode or a formal algorithm block with detailed steps. |
| Open Source Code | Yes | Code and data are available at https://github.com/fyyCS/LSLD. |
| Open Datasets | Yes | In the AVVP task, we only evaluate our method on the Look, Listen and Parse (LLP) Dataset [4] following previous AVVP work. ... For the training process, we use 10,000 video clips with only video-level event labels. |
| Dataset Splits | Yes | The remaining 1849 validation and test videos possess modality and segment-specific labels (i.e., start and end time of each event on audio and visual track). We conduct experiments following the official data splits from the LLP dataset. |
| Hardware Specification | Yes | We conduct the training and evaluation processes on a single NVIDIA GTX 2080 Ti GPU with 11 GB memory. |
| Software Dependencies | No | The paper mentions various models and tools used (e.g., CLIP, CLAP, Resnet, VGGish, Adam optimizer) but does not provide specific version numbers for any of these software components or libraries, which would be necessary for full reproducibility. |
| Experiment Setup | Yes | Following HAN [4] we adopt the Adam optimizer and the learning rate 2e-4 drops by a factor of 0.25 for every 6 epochs. We train the model with a batch size of 32 for 20 epochs. ... α is set to 4 and β is 0.4. |