Pointer Networks
Authors: Oriol Vinyals, Meire Fortunato, Navdeep Jaitly
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show Ptr-Nets can be used to learn approximate solutions to three challenging geometric problems finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem using training examples alone. Ptr-Nets not only improve over sequence-to-sequence with input attention, but also allow us to generalize to variable size output dictionaries. We show that the learnt models generalize beyond the maximum lengths they were trained on. We hope our results on these tasks will encourage a broader exploration of neural learning for discrete problems. |
| Researcher Affiliation | Collaboration | Oriol Vinyals Google Brain Meire Fortunato Department of Mathematics, UC Berkeley Navdeep Jaitly Google Brain |
| Pseudocode | No | The paper describes its models and algorithms using prose and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states, "We will release all the datasets at hidden for reference." This refers to datasets, not the source code for the methodology. No other statements or links related to source code availability are provided. |
| Open Datasets | No | The paper mentions generating its own datasets: "In the training data, the inputs are planar point sets P = {P1, . . . , Pn} with n elements each, where Pj = (xj, yj) are the cartesian coordinates of the points..." It also states, "We will release all the datasets at hidden for reference." However, "at hidden" does not provide concrete access information (e.g., a specific URL, DOI, or repository name) for immediate public availability. |
| Dataset Splits | No | The paper mentions generating "1M training example pairs" and discusses |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using LSTMs and various algorithms (e.g., Held-Karp, Christofides) but does not specify any software libraries, frameworks, or programming languages with their version numbers that would be necessary to replicate the experiments. |
| Experiment Setup | Yes | As a result, all our models used a single layer LSTM with either 256 or 512 hidden units, trained with stochastic gradient descent with a learning rate of 1.0, batch size of 128, random uniform weight initialization from -0.08 to 0.08, and L2 gradient clipping of 2.0. We generated 1M training example pairs, and we did observe overfitting in some cases where the task was simpler (i.e., for small n). Training generally converged after 10 to 20 epochs. |