Schema-learning and rebinding as mechanisms of in-context learning and emergence

Authors: Sivaramakrishnan Swaminathan, Antoine Dedieu, Rajkumar Vasudeva Raju, Murray Shanahan, Miguel Lazaro-Gredilla, Dileep George

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We substantiate the above argument using empirical results on three datasets: (a) the GINC benchmark introduced in [3], (b) a suite of algorithm learning tasks that we introduce in our LIALT datasets, and (c) a zero-shot word usage induction task on a CSCG language model.
Researcher Affiliation Industry Sivaramakrishnan Swaminathan Antoine Dedieu Rajkumar Vasudeva Raju Murray Shanahan Miguel Lázaro-Gredilla Dileep George Google Deep Mind {sivark,adedieu,rajvraju,mshanahan,lazarogredilla,dileepgeorge}@google.com
Pseudocode Yes Algorithm 1 Fast rebinding algorithm; Algorithm 2 Prompt completion
Open Source Code No The paper does not provide an explicit statement or link to its open-source code.
Open Datasets Yes Dataset: The GINC dataset [3] introduced for studying ICL... We train a single CSCG with 50 clones on the GINC dataset... To test for this capability, we train a CSCG on the Pre Co dataset [26], which is a large-scale English dataset for coreference resolution.
Dataset Splits No The paper describes its training and test sets but does not provide explicit details on a validation split (e.g., percentages or counts) from the overall dataset.
Hardware Specification No The paper does not specify the hardware used for its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We train a single CSCG with 50 clones on the GINC dataset for 100 full-batch EM iterations using a pseudocount [6] of ϵ = 10 2... We parameterize CSCG capacity via this proportionality factor the overallocation ratio... We train CSCGs for an increasing sequence of overallocation ratios on the training data with 500 EM iterations and a pseudocount of ϵ = 10 6. After running EM, we run 10 iterations of Viterbi training [23]. We use Algorithm 1 with ϵ = 10 6 and psurprise = 1 16 to rebind the emission matrix on each of these prompts