Machine Teaching of Active Sequential Learners
Authors: Tomi Peltola, Mustafa Mert Çelikok, Pedram Daee, Samuel Kaski
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test the formulation with multi-armed bandit learners in simulated experiments and a user study. The results show that learning is improved by (i) planning teaching and (ii) the learner having a model of the teacher. |
| Researcher Affiliation | Academia | Tomi Peltola tomi.peltola@aalto.fi Mustafa Mert Çelikok mustafa.celikok@aalto.fi Pedram Daee pedram.daee@aalto.fi Samuel Kaski samuel.kaski@aalto.fi Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto University, Helsinki, Finland |
| Pseudocode | No | The paper describes algorithms and models (e.g., Thompson sampling, teacher model) in prose and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code is available at https://github. com/Aalto PML/machine-teaching-of-active-sequential-learners. |
| Open Datasets | Yes | We use a word relevance dataset for simulating an information retrieval task... The Word dataset is a random selection of 10,000 words from Google s Word2Vec vectors, pre-trained on Google News dataset [42]. |
| Dataset Splits | No | The paper describes the setup of simulation experiments and a user study, but it does not specify explicit training, validation, or test dataset splits in the traditional machine learning sense. For example, it mentions |
| Hardware Specification | No | The paper states: "We acknowledge the computational resources provided by the Aalto Science-IT Project." However, this does not provide specific hardware details such as CPU/GPU models, memory, or specific cloud instance types. |
| Software Dependencies | Yes | We implemented the models in the probabilistic programming language Pyro (version 0.3, under Py Torch v1.0) [40] and approximate the posterior distributions with Laplace approximations [41, Section 4.1]. |
| Experiment Setup | Yes | The ground-truth relevance profile is generated by first setting ˆθ = [c, dˆx] RM+1, where c = 4 is a weight for an intercept term (a constant element of 1 is added to the xs) and d = 8 is a scaling factor. [...] We use ˆβ = 20 as the planning teacher s optimality parameter and also set β of the learner s teacher model to the same value. For multi-step models, we set γt = 1/T, so that they plan to maximise the average return up to horizon T. |