Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Human-like Few-Shot Learning via Bayesian Reasoning over Natural Language
Authors: Kevin Ellis
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We contribute (1) a model of symbolic concept learning that supports efficient inference over a flexible hypothesis class; (2) an evaluation on human data from two different concept learning experiments; and (3) a simple recipe for extracting a humanlike prior over concepts, given raw behavioral data. |
| Researcher Affiliation | Academia | Kevin Ellis Cornell University EMAIL |
| Pseudocode | No | The paper includes Python code snippets for translating natural language concepts but does not present a formal pseudocode block for the overall algorithm or any of its main components. |
| Open Source Code | Yes | Code and data available at: https://github.com/ellisk42/humanlike_fewshot_learning |
| Open Datasets | Yes | We take human data from [43]. ... We obtain this human data from [55], which covers 112 concepts, collecting judgements from 1,596 human participants as they attempt to learn each concept over 25 batches of examples. |
| Dataset Splits | Yes | For the number game we do 10-fold cross validation to calculate holdout predictions. |
| Hardware Specification | Yes | All model were trained on a laptop using no GPUs. |
| Software Dependencies | No | The paper mentions software like “Codex code-davinci-002”, “GPT-4”, “Code Gen 350M”, “all-Mini LM-L6”, and “Adam”, but does not specify their version numbers to allow for reproducible setup. |
| Experiment Setup | Yes | We use Adam [46] to perform maximum likelihood estimation of the parameters, following Eq. 5. ... We perform 1000 epochs of training for the Number Game, and 100 epochs for logical concepts. ... We also place a learnable temperature parameter on the posterior. ... We fit these parameters using Adam with a learning rate of 0.001. |