Universal Approximation Under Constraints is Possible with Transformers
Authors: Anastasis Kratsios, Behnoosh Zamanlooy, Tianlin Liu, Ivan Dokmanić
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Appendix 5, we show that the answer is indeed: Yes! , by proposing a training algorithm in that direction and showing that we outperform an MLP model and a classical transformer network in terms of a joint MSE and distance to the constraint set. The evaluation is performed on a large number of randomly generated experiments, whose objective is to reduce the MSE to a randomly generated function mapping a high-dimensional Euclidean space to there sphere R3 with outputs constrained to the sphere. |
| Researcher Affiliation | Academia | Anastasis Kratsios , Tianlin Liu & Ivan Dokmani c Universit at Basel, Departement Mathematik und Informatik {firstname.lastname}@unibas.ch Behnoosh Zamanlooy Universit at Z urich, Department of Informatics bzamanlooy@ifi.uzh.ch |
| Pseudocode | No | The paper provides detailed mathematical formulations of its components and theorems but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Anonymized. Pytorch implementation of attend-to-constraints, 2021. URL https://drive.google.com/file/d/1vryYsUmHt0fok3Mrje6oN9Tjs2Umpgk A/view. |
| Open Datasets | No | The evaluation is performed on a large number of randomly generated experiments, whose objective is to reduce the MSE to a randomly generated function mapping a high-dimensional Euclidean space to there sphere R3 with outputs constrained to the sphere. The paper uses randomly generated data for which no public access information (link, citation, etc.) is provided. |
| Dataset Splits | No | The paper mentions 'training data' and 'training algorithm' but does not provide specific details on how the dataset was split into training, validation, and test sets, or specific percentages/counts. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Pytorch implementation' in its bibliography, but it does not specify version numbers for PyTorch or any other software dependencies needed for reproduction. |
| Experiment Setup | No | The paper refers to a 'training algorithm' and optimizing 'MSE' in Appendix 5, but it does not provide concrete experimental setup details such as hyperparameter values (e.g., learning rate, batch size) or specific optimizer settings. |