Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Hardware-Aware Tractable Learning of Probabilistic Models
Authors: Laura I. Galindez Olascoaga, Wannes Meert, Nimish Shah, Marian Verhelst, Guy Van den Broeck
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experimental evaluation We empirically evaluate the proposed techniques on a relevant embedded sensing use case: the Human Activity Recognition (HAR) benchmark [1]. Additionally, we show our method s general applicability on a number of other publicly available datasets [8, 15, 21, 26, 31], two of them commonly used for density estimation benchmarks and the rest commonly used for classification (see Table 1).4 |
| Researcher Affiliation | Academia | Electrical Engineering Department, KU Leuven Computer Science Department, KU Leuven Computer Science Department, University of California, Los Angeles EMAIL EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Scale SI(αk,Fprun,Sprun); Algorithm 2: Prune AC(α,F); Algorithm 3: Get Pareto(σ,acc,cost) |
| Open Source Code | Yes | Code available at https://github.com/laurago894/Hw Aware Prob. |
| Open Datasets | Yes | Human Activity Recognition (HAR) benchmark [1]. Additionally, we show our method s general applicability on a number of other publicly available datasets [8, 15, 21, 26, 31]... For the classification benchmarks... subjected them to a 75%-train, 10%-validation and 15%-test random split. |
| Dataset Splits | Yes | We then binarized them using a one-hot encoding and subjected them to a 75%-train, 10%-validation and 15%-test random split. |
| Hardware Specification | No | The paper states: 'All computation costs for this dataset are normalized according to the energy consumption trends of an embedded ARM M9 CPU, assuming 0.1n J per operation [39].' This refers to the hardware model used for cost estimation, not the specific hardware (e.g., GPU/CPU models) used to run the experiments and generate the reported accuracy figures. |
| Software Dependencies | No | The paper mentions specific algorithms and tools such as 'Learn PSDD algorithm [24]' and 'ACE compiler', but does not provide version numbers for general software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We learned the models on the train and validation sets with the Learn PSDD algorithm [24], using the same settings reported therein, and following the bottomup vtree induction strategy. To populate the model set α, we retained a model after every N/10 iterations, where N is the number of iterations needed for convergence (this is until the log-likelihood on validation data stagnates). |