Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity
Authors: Jiachen Jiang, Jinxin Zhou, Zhihui Zhu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results on common transformers reveal that representations across layers are positively correlated, with similarity increasing when layers get closer. We conduct experiments on both vision and NLP tasks to demonstrate the performance of the proposed aligned training. |
| Researcher Affiliation | Academia | Jiachen Jiang, Jinxin Zhou & Zhihui Zhu Department of Computer Science and Engineering, The Ohio State University, EMAIL |
| Pseudocode | No | The paper describes methods and mathematical formulations (e.g., equations for COS, CKA, and aligned loss) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or links to code repositories for the described methodology. |
| Open Datasets | Yes | We conduct experiments on both vision and NLP tasks to demonstrate the performance of the proposed aligned training. We conduct experiments on both the CIFAR10 and Image Net1K datasets. For text classification tasks, we get Aligned BERT by finetuning a pretrained 12-layer BERTBase model (Devlin, 2018) using aligned training method on GLUE benchmark (Wang et al., 2018) tasks. For text generation task, we get Aligned GPT by finetuning a pretrained 12-layer GPT2 model (Radford et al., 2019) using aligned training method on Wikitext-103 dataset (Merity et al., 2016). |
| Dataset Splits | Yes | We conduct experiments on both the CIFAR10 and Image Net1K datasets. The CIFAR10 dataset includes 60,000 color images in 10 classes, each measuring 32 32 pixels. Image Net1K contains 1.2 million color images distributed in 1000 classes. For text classification tasks... on GLUE benchmark... The Wiki Text-103 language modeling dataset consists of over 100 million tokens extracted from Wikipedia s verified good and featured articles. |
| Hardware Specification | Yes | For both vision and NLP tasks, we used 4 RTX A5000 GPUs with 24GB of memory each. The model is finetuned using a single 24G RTX A5000 GPU for 70 hours. |
| Software Dependencies | No | The paper mentions several models (e.g., Dei T-S, BERTBase, GPT2) and optimizers (Adam W) but does not specify version numbers for any software libraries (e.g., PyTorch, TensorFlow) or programming languages (e.g., Python). |
| Experiment Setup | Yes | For optimization, we employ Adam W with an initial learning rate of 0.1. This rate decays according to the Multi Step LR at the 100th and 150th epochs, over a total of 200 epochs. We set the weight decay at 1e-4. The global batch size for both datasets is set at 256. In our Aligned BERT experiments on the GLUE dataset, we used a sequence length of 256. We employed Adam W for optimization with an initial learning rate of 2e-5, and a batch size of 32. Each task underwent fine-tuning for three epochs. For Aligned GPT experiments on the Wiki Text-103 dataset, we maintained the sequence length at 256 and used Adam W with an initial learning rate of 2e-5. In this case, we set the batch size to 8. |