Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ktrain: A Low-Code Library for Augmented Machine Learning
Authors: Arun S. Maiya
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present ktrain, a low-code Python library that makes machine learning more accessible and easier to apply. To illustrate ease of use, we provide fully-complete example for text classification. More specifically, we train a Chinese-language sentiment-analyzer on a dataset of hotel reviews. Fine-Tuning a BERT Text Classifier for Chinese: import ktrain from ktrain text as txt # STEP 1: load and preprocess data trn , val , preproc = txt . texts_from_folder( ' Chn Senti Corp ' , maxlen=75, preprocess_mode=' bert ' ) # STEP 2: load model and wrap in Learner model = txt . text_classifier( ' bert ' , trn , preproc=preproc) learner = ktrain . get_learner(model , train_data=trn , val_data=val) # STEP 3: estimate l e a r n i n g rate learner . lr_find(show_plot=True) # STEP 4: t r a i n model learner . fit_onecycle(2e 5, 4) Table 1 compares ktrain to popular low-code and Auto ML libraries in their out-of-the-box support for a variety of machine learning tasks. |
| Researcher Affiliation | Industry | Arun S. Maiya EMAIL Institute for Defense Analyses Alexandria, VA, USA |
| Pseudocode | No | The paper includes Python code examples for demonstrating the library's use, such as 'Fine-Tuning a BERT Text Classifier for Chinese:' and 'Building an End-to-End Open-Domain QA System in ktrain'. These are actual code blocks, not pseudocode or algorithm blocks. The description of steps (e.g., STEP 1: Load and Preprocess Data) is in natural language prose. |
| Open Source Code | Yes | ktrain is open-source, free to use under a permissive Apache license, and available on Git Hub at: https://github.com/amaiya/ktrain. |
| Open Datasets | Yes | More specifically, we train a Chinese-language sentiment-analyzer on a dataset of hotel reviews.2 (Footnote 2: https://github.com/Tony607/Chinese_sentiment_analysis) using the well-studied 20 Newsgroups dataset.3 (Footnote 3: http://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups) |
| Dataset Splits | No | The paper mentions loading training and validation data (e.g., 'trn , val , preproc = txt . texts_from_folder(...)') for the Chinese sentiment analysis example and loading documents into a list ('docs') for the 20 Newsgroups QA system. However, it does not specify the exact percentages, sample counts, or a detailed methodology for how these datasets were split into training, validation, or test sets. |
| Hardware Specification | No | The paper generally mentions that 'fast models such as fast Text... and NBSVM... are amenable to being trained on a standard laptop CPU.' This is a general statement about capability, not a specific hardware specification used for running the experiments described in the paper. No specific CPU models, GPU models, or other detailed hardware configurations are provided. |
| Software Dependencies | No | The paper mentions several software components like 'Python library', 'TensorFlow', 'transformers', 'scikit-learn', and 'stellargraph', and provides Python code examples that import 'ktrain'. However, it does not provide specific version numbers for any of these software dependencies, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | The example for 'Fine-Tuning a BERT Text Classifier for Chinese:' includes the line: 'learner . fit_onecycle(2e 5, 4)', which explicitly provides a learning rate (2e-5) and the number of epochs (4) for the training process. |