Contrastive Learning with the Feature Reconstruction Amplifier
Authors: Wentao Cui, Liang Bai
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we first introduce implementation details in our experiments. Then we conduct detailed ablation experiments of the FRA module, including three different losses and the network structure. Lastly, we compare the Sim FRA framework with several recent contrastive learning methods (our reproduced version), including the linear evaluation and transfer learning. |
| Researcher Affiliation | Academia | Wentao Cui1, Liang Bai1,2* 1 Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, Shanxi, China 2 Institute of Intelligent Information Processing, Shanxi University, Taiyuan, 030006, Shanxi, China cuiwentao.sxu@qq.com, bailiang@sxu.edu.cn |
| Pseudocode | Yes | Algorithm 1: The Sim FRA algorithm Input: Instances X; augmentation methods Av; the encoder network f( ); the projection head gp( ); the amplifier head ga( ) Parameter: Temperature hyperparameter τ; number of training epochs n Output: The encoder network f( ) 1: for i = 1 to n do 2: Xv = Av(X) 3: Hv = f(Xv) 4: Zv = gp(Hv) 5: Generate random Gaussian noises E and get e Z v by reconstructing Zv with E 6: Rv = ga(e Z v) 7: calculate the LInfo NCE loss by Eq. (1) 8: calculate the LA loss by Eq. (2) to (4) 9: optimize the Sim FRA network by Eq. (5) 10: end for 11: return the encoder network f( ) |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | Yes | Datasets. We investigate contrastive learning using some common image datasets, such as CIFAR-10, CIFAR100, STL-10, Image Net-100, and Voc2007. Among them, CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton 2009)... STL-10 (Coates, Ng, and Lee 2011) and Image Net-100, i.e., IN-100, are derived from the Image Net-1k dataset (Deng et al. 2009). |
| Dataset Splits | Yes | CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton 2009) each contains 50,000 training images and 10,000 test images. ... For testing the representation quality, we train a supervised linear classifier for 500 epochs with these fixed feature embeddings. At last, we test the classification accuracy on the test set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU specifications, or cloud computing resources. |
| Software Dependencies | No | The paper mentions optimizers (Adam, SGD) and backbone networks (Res Net-18, Res Net-50) and states methods were reproduced from previous work, but it does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Experimental setup. We reproduce several contrastive learning methods based on the code provided in previous work. All the data in the experiments are the test results of our reproduction methods. We generally set two batch sizes. On the IN-100 dataset, we set the batch size to 64. On other datasets, the batch size is 32. As for the backbone network, we mainly use the standard Res Net-18 and Res Net-50 (He et al. 2016). ... As for the optimizer, most methods use the Adam optimizer (Kingma and Ba 2014), but Mo Co and Mo Co v2 (Chen et al. 2020b) use the SGD optimizer. In Mo Co and Mo Co v2, the initial learning rate is set to 0.03, the SGD weight decay is 10 4 and the SGD momentum is 0.9. In DCL and HCL, the learning rate is 0.001 and the weight decay is 10 6. In Sim CLR and our Sim FRA, the learning rate is 3 10 4. In BYOL, the learning rate is 2 10 4. As for the specific hyperparameters of each method, we set the temperature τ = 0.07, the memory bank size k = 65536, and the momentum m = 0.999 in Mo Co and Mo Co v2. In Sim CLR and our Sim FRA, the temperature τ is set to 0.5. In DCL, the temperature τ = 0.5, and the positive class prior τ + = 0.1. In HCL, the temperature τ is set to 0.5. The positive class prior τ + and the concentration parameter β are set following in the paper. In BYOL, the exponential moving average parameter τ is set to 0.99. At last, we train models on the Voc2007 dataset for 500 epochs. On other datasets, we train the model for 400 epochs. |