Learning to Discover Efficient Mathematical Identities
Authors: Wojciech Zaremba, Karol Kurach, Rob Fergus
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show how these approaches enable us to derive complex identities, beyond reach of brute-force search, or human derivation. All code and evaluation data can be found at https://github.com/kkurach/math_learning. |
| Researcher Affiliation | Collaboration | Wojciech Zaremba Dept. of Computer Science Courant Institute New York Unviersity Karol Kurach Google Zurich & Dept. of Computer Science University of Warsaw Rob Fergus Dept. of Computer Science Courant Institute New York Unviersity |
| Pseudocode | No | The paper describes procedures and models but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code. |
| Open Source Code | Yes | All code and evaluation data can be found at https://github.com/kkurach/math_learning. |
| Open Datasets | Yes | We first create a dataset of symbolic expressions, spanning the space of all valid expressions up to degree k. We then group them into clusters of equivalent expressions (using the numerical representation to check for equality), and give each cluster a discrete label 1 . . . C. |
| Dataset Splits | No | The paper states 'Each class is split 80/20 into train/test sets.' in Section 4.2 but does not explicitly mention a validation split. |
| Hardware Specification | Yes | Running on a 3Ghz 16-core Intel Xeon. |
| Software Dependencies | No | The paper does not provide specific version numbers for any ancillary software, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | A vector a Rl, where l = 30 is used to represent each input variable. The weight matrix in the softmax classifier has much larger ( 100) learning rate than the rest of the layers. We use dropout [13] as the network has a tendency to overfit and repeat exactly the same expressions for the next value of k. Thus, instead of training on exactly φ(b1) and φ(b2), we drop activations as we propagate toward the top of the tree (the same fraction for each depth), which encourages the RNN to capture more local structures. |