TaCo: Textual Attribute Recognition via Contrastive Learning
Authors: Chang Nie, Yiqing Hu, Yanqiu Qu, Hao Liu, Deqiang Jiang, Bo Ren
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that Ta Co surpasses the supervised counterparts and advances the state-of-the-art remarkably on multiple attribute recognition tasks. |
| Researcher Affiliation | Industry | Tencent You Tu Lab {changnie, hooverhu, yanqiuqu, ivanhliu, dqiangjiang, timren}@tencent.com |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | Online services of Ta Co will be publicly released soon to assist relevant researchers and designers. |
| Open Datasets | No | Now there exist no publicly available datasets for textual attributes. We constructed a large-scale synthetic dataset (Syn Attr) comprising one million images of text segments for system pre-training and fine-tuning. |
| Dataset Splits | No | The paper mentions 'For validation, we manually annotated a dataset Attr-5k comprising 5k individual sentence images' but does not specify the train/validation/test splits (e.g., percentages or counts) for its main synthetic dataset (Syn Attr) or Attr-5k for reproducibility. |
| Hardware Specification | Yes | All experiments are implemented on a platform with 8 Nvidia V100 GPUs. |
| Software Dependencies | No | The paper mentions frameworks like Sim Siam and model architectures like Res Net-50 and Deformable DETR, but does not provide specific version numbers for software dependencies (e.g., programming languages, deep learning libraries). |
| Experiment Setup | Yes | The standard SGD optimizer with a learning rate of 0.1 is used for optimization. We train for 100 epochs (taking 26 hours) and adjust the learning rate using a Cosine Annealing strategy. The patch size P and the number of attention heads of MAEM are set to 4. For data augmentation, our pretext tasks include: 1) randomly reordering the words with a probability of 0.5, 2) randomly cropping views from the original image by ratio range (0.8 1, 0.6 1, then rescaling and padding them to a fixed size of (32, 256) without changing its aspect ratio, and 3) color jittering alters the brightness, contrast, saturation and hue of an image with an offset degree of (0.4, 0.4, 0.4, 0.1) with a probability of 0.8. |