Schema-learning and rebinding as mechanisms of in-context learning and emergence
Authors: Sivaramakrishnan Swaminathan, Antoine Dedieu, Rajkumar Vasudeva Raju, Murray Shanahan, Miguel Lazaro-Gredilla, Dileep George
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We substantiate the above argument using empirical results on three datasets: (a) the GINC benchmark introduced in [3], (b) a suite of algorithm learning tasks that we introduce in our LIALT datasets, and (c) a zero-shot word usage induction task on a CSCG language model. |
| Researcher Affiliation | Industry | Sivaramakrishnan Swaminathan Antoine Dedieu Rajkumar Vasudeva Raju Murray Shanahan Miguel Lázaro-Gredilla Dileep George Google Deep Mind {sivark,adedieu,rajvraju,mshanahan,lazarogredilla,dileepgeorge}@google.com |
| Pseudocode | Yes | Algorithm 1 Fast rebinding algorithm; Algorithm 2 Prompt completion |
| Open Source Code | No | The paper does not provide an explicit statement or link to its open-source code. |
| Open Datasets | Yes | Dataset: The GINC dataset [3] introduced for studying ICL... We train a single CSCG with 50 clones on the GINC dataset... To test for this capability, we train a CSCG on the Pre Co dataset [26], which is a large-scale English dataset for coreference resolution. |
| Dataset Splits | No | The paper describes its training and test sets but does not provide explicit details on a validation split (e.g., percentages or counts) from the overall dataset. |
| Hardware Specification | No | The paper does not specify the hardware used for its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We train a single CSCG with 50 clones on the GINC dataset for 100 full-batch EM iterations using a pseudocount [6] of ϵ = 10 2... We parameterize CSCG capacity via this proportionality factor the overallocation ratio... We train CSCGs for an increasing sequence of overallocation ratios on the training data with 500 EM iterations and a pseudocount of ϵ = 10 6. After running EM, we run 10 iterations of Viterbi training [23]. We use Algorithm 1 with ϵ = 10 6 and psurprise = 1 16 to rebind the emission matrix on each of these prompts |