Modeling Structure with Undirected Neural Networks
Authors: Tsvetomila Mihaylova, Vlad Niculae, Andre Martins
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of undirected neural architectures, both unstructured and structured, on a range of tasks: tree-constrained dependency parsing, convolutional image classification, and sequence completion with attention. |
| Researcher Affiliation | Collaboration | 1Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal 2Language Technology Lab, University of Amsterdam, The Netherlands 3LUMLIS, Lisbon ELLIS Unit, Portugal 4Unbabel, Lisbon, Portugal. |
| Pseudocode | No | The paper describes algorithms using mathematical equations and textual explanations but does not contain a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | The source code is on https://github.com/deep-spin/unn. |
| Open Datasets | Yes | We demonstrate this on the MNIST dataset of handwritten digits (Deng, 2012) and We test the architecture on several datasets from Universal Dependencies 2.7 (Zeman et al., 2020). |
| Dataset Splits | Yes | The learning rate for each language is chosen via grid search for highest UAS on the validation set for the baseline model. and splitting them into training and test sets with around 706K and 78K instances. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions using Adam optimizer and PyTorch functions (e.g., 'torch.conv2d'), but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We use γ = .1 and an Adam learning rate of .0005. and Adam with learning rate 10 4. The hidden dimension is d = 256, and gradients with magnitude beyond 10 are clipped. and The learning rate for each language is chosen via grid search for highest UAS on the validation set for the baseline model. We searched over the values {0.1, 0.5, 1, 5, 10} 10 5. In the experiments, we use 10 5 for Italian and 5 10 5 for the other languages. We employ dropout regularization, using the same dropout mask for each variable throughout the inner coordinate descent iterations, so that dropped values do not leak. |