Language Semantic Graph Guided Data-Efficient Learning

Authors: Wenxuan Ma, Shuang Li, lincan Cai, Jingxuan Kang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Across image, video, and audio modalities, we utilize the LSG method in both TL and SSL scenarios and illustrate its versatility in significantly enhancing performance compared to other data-efficient learning approaches. Additionally, our in-depth analysis shows that the LSG method also expedites the training process.
Researcher Affiliation Academia Wenxuan Ma Beijing Institute of Technology wenxuanma@bit.edu.cn Shuang Li Beijing Institute of Technology shuangli@bit.edu.cn Lincan Cai Beijing Institute of Technology lincancai@bit.edu.cn Jingxuan Kang University of Liverpool sgjkang3@liverpool.ac.uk
Pseudocode No The paper describes the steps of the method in text and provides a framework illustration in Figure 1, but it does not include any formally labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code Yes Code available at: https://github.com/BIT-DA/LSG
Open Datasets Yes We conduct experiments on 7 standard datasets that are intensively studied in Transfer Learning [77, 41, 27] and Semi-supervised learning [67, 24] and cover input data ranging images, videos and audios. For image datasets, we adopt FGVC Aircraft [44] (10,000 images for 100 aircraft variants), Stanford Cars [30] (16,185 images for 196 car categories) and CUB-200-2011 [66] (11,788 images for 200 bird species) for fine-grained classification analysis and Office Home [65] (four domains, each contains roughly 4,000 images for 65 categories) to evaluate out-of-distribution performance. For video datasets, we use UCF-101 [59] (13,320 video clips in 101 categories) and HMDB51 [31] (6,766 clips form 51 actions) For audio dataset, we report the performance on Audio Set-20K [16].
Dataset Splits Yes We analyze the performance of LSG under labeled data partition ratio of 15%, 30%, 50% as well as the full training set. (...) Specifically, samples in the source domain are randomly partitioned into 80% training data and 20% test data, and the results refers to accuracy on test data.
Hardware Specification No The paper does not explicitly mention any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only refers to model architectures like "Res Net-50" and "Vi T-L."
Software Dependencies No The paper mentions using "pretrained BERT-L [25] as the language model" and "SGD with a momentum of 0.9 as the optimizer," but it does not provide specific version numbers for these or any other software components.
Experiment Setup Yes We utilize a pretrained BERT-L [25] as the language model to transform labels to text embeddings. For each concept, we contextualize it into complete sentences using 20 handcrafted prompts. When constructing the Language Semantic Graph, the similarity threshold τ is determined adaptively to include the top ρ = 0.3% edges that connects between nodes of different labels of the fully connected graph. The GCN is trained on full graph for 5,000 iterations. (...) the projector H is implemented by a fully-connected layer with an output dimension of 1024 and randomly initialized weights. We find that setting λ and µ to 1.0 and 8.0 generally achieves satisfying results within all the experiments. In image classification tasks, we adopt SGD with a momentum of 0.9 as the optimizer. The learning rate is set as 1e-3 for the visual backbone in most experiments and a 10 larger value is applied for the classifier and projector in SSL and SDG.