Fundamentals of Task-Agnostic Data Valuation
Authors: Mohammad Mohammadi Amiri, Frederic Berdoz, Ramesh Raskar
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We will show through extensive experiments on real tabular and image datasets that the proposed estimates capture the diversity and relevance of the seller s data for the buyer. Experiments We evaluate our estimates for diversity and relevance using real datasets, namely Adult (Kohavi), MNIST (Le Cun et al.), fashion-MNIST (Xiao et al.), Cifar-10 (Krizhevsky) and Fair Face (Karkkainen et al.). |
| Researcher Affiliation | Academia | Mohammad Mohammadi Amiri1, Fr ed eric Berdoz2, Ramesh Raskar1 1MIT, Media Lab, 75 Amherst St, Cambridge, MA 02139, USA 2EPFL, Lausanne, Switzerland |
| Pseudocode | No | The paper describes the proposed method in prose and mathematical formulas, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statements about releasing code or links to a code repository. |
| Open Datasets | Yes | We evaluate our estimates for diversity and relevance using real datasets, namely Adult (Kohavi), MNIST (Le Cun et al.), fashion-MNIST (Xiao et al.), Cifar-10 (Krizhevsky) and Fair Face (Karkkainen et al.). |
| Dataset Splits | No | The paper does not specify explicit train/validation/test splits, percentages, or sample counts for the datasets used in its experiments. It mentions |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments (e.g., CPU, GPU models, memory specifications). |
| Software Dependencies | No | The paper mentions using "VGG16 model pre-trained on the Image Net dataset" but does not specify any software names with version numbers (e.g., Python, TensorFlow, PyTorch versions) that would be needed for reproducibility. |
| Experiment Setup | No | The paper describes the conceptual setup for calculating diversity and relevance metrics (e.g., using principal components with eigenvalues > 10^-2), but it does not provide specific hyperparameter values (like learning rate, batch size, epochs) or detailed system-level training settings as would be typical for a machine learning model training experiment. |