Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Task Descriptors Help Transformers Learn Linear Models In-Context
Authors: Ruomin Huang, Rong Ge
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we verify our results by showing that the weights converge to the predicted global minimum and Transformers indeed perform better with task descriptors. Finally, we empirically verify our findings in Section 5. |
| Researcher Affiliation | Academia | Ruomin Huang Duke University EMAIL Rong Ge Duke University EMAIL |
| Pseudocode | No | The paper describes the methods narratively and mathematically but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing code or links to a code repository. |
| Open Datasets | No | We generate 4096 i.i.d. input sequences for each episode of training. For all experiments in this section, the data dimension d = 5 and the covariance matrix Î = Id. |
| Dataset Splits | No | We generate 4096 i.i.d. input sequences for each episode of training. In practice, we can generate m sequences SĪ1, SĪ2, ..., SĪm, and the empirical loss is just the mean-squared error for all the sequences. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | We use Adam optimizer (Kingma & Ba, 2015) to train Transformers. We also use â2 gradient clipping to stabilize training. No specific version numbers for software libraries or environments are provided. |
| Experiment Setup | Yes | We generate 4096 i.i.d. input sequences for each episode of training. For all experiments in this section, the data dimension d = 5 and the covariance matrix Î = Id. For all experiments, we use Adam optimizer (Kingma & Ba, 2015) to train Transformers. We also use â2 gradient clipping to stabilize training. |