Explanations that reveal all through the definition of encoding
Authors: Aahlad Manas Puli, Nhi Nguyen, Rajesh Ranganath
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section consists of two parts. The first part demonstrates the weak and strong detection capabilities of the evaluations ROAR, EVAL-X, and STRIPE-X in a simulated setting and on an image recognition task. To demonstrate these capabilities, we run these evaluations on instantiations of POSI, PRED, and MARG. |
| Researcher Affiliation | Academia | Aahlad Puli , Nhi Nguyen , Rajesh Ranganath New York University |
| Pseudocode | Yes | Algorithm 1: ENCODE-METER, generative version. Algorithm 2: STRIPE-X, predictive version. |
| Open Source Code | No | Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We only use public data. We use standard existing training techniques, describe the hyperparameters in detail, and provided the Llama 3 prompts we used in our experiments. |
| Open Datasets | Yes | We consider an image recognition task like the one in Figure 3 with labels and images from the cats_vs_dogs dataset from the Tensorflow package [32]. ... The base cat and dog images were obtained from the cats_vs_dogs dataset from the Tensorflow datasets package. |
| Dataset Splits | Yes | The training, validation, and test dataset consist of 8000, 1000, and 1000 samples respectively. |
| Hardware Specification | Yes | The cats vs. dogs experiment were done on an A100 GPU where the whole training and evaluation ran in less than 20 minutes. All training and inference for this experiment was done on an A100. |
| Software Dependencies | No | The paper mentions using Tensorflow, GPT-2 models, Adam W optimizer, but does not provide specific version numbers for these software components. For example, it doesn't specify "Tensorflow 2.x" or "PyTorch 1.x". |
| Experiment Setup | Yes | The EVAL-X model is trained for 100 epochs with a batch size of 100 with the Adam optimizer, with the learning rate and weight decay parameters set to 10 3 and 0 respectively. The pθ(F | xv, ℓ, v) model is trained for 50 epochs with a batch size of 200 with the Adam optimizer, with the learning rate and weight decay parameters set to 5 10 5 and 1 respectively. |