Selective Sampling and Imitation Learning via Online Regression
Authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments to verify our theory. |
| Researcher Affiliation | Academia | 1MIT, 2Cornell University |
| Pseudocode | Yes | Algorithm 1 Selective SAmplin G with Expert Feedback (SAGE); Algorithm 2 Inte RActi Ve Imitati On Learning VIa Active Expert Querying (RAVIOLI); Algorithm 3 Selective Sampling with Expert Feedback for Stochastic Contexts; Algorithm 4 Inte RActi Ve Imitati On Learning VIa Active Queries to M Experts (RAVIOLI M) |
| Open Source Code | No | The paper does not contain an explicit statement about the release of open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We first introduce the simulator, Cart Pole [Barto et al., 1983, Brockman et al., 2016] |
| Dataset Splits | No | The paper mentions using the CartPole environment but does not specify training, validation, or test dataset splits or percentages. It describes generating expert policies and using a neural network for the function class. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. It only mentions using a neural network. |
| Software Dependencies | No | The paper mentions using a 'neural network (single hidden layer neural network,with 4 neurons in the hidden layer)' but does not specify any software names with version numbers (e.g., specific deep learning frameworks like PyTorch or TensorFlow, along with their versions). |
| Experiment Setup | Yes | First, we use a neural network (single hidden layer neural network,with 4 neurons in the hidden layer) as our function class {Fm h }h H,m M. Second, we specify Select Action to pick the action of the most confident expert, i.e., Select Action(f 1 t,h(x),...,f M t,h(x)) = sign(f ˆi t,h(x)) where ˆi = arg max i [M] f i t,h(x) . In our key experiments, we choose α = 50 when the number of experts is 1, 2 or 3, and choose 200 for 5-expert experiments. |