June 3 2024

Reusing Past Machine Learning Models Based on Data Similarity Metrics

Abstract

Many of today’s domains of application of Machine Learning (ML) are dynamic in the sense that data and their patterns change over time. This has a significant impact in the ML lifecycle and operations, requiring frequent model (re-)training, or other strategies to deal with outdated models and data. This need for dynamic and responsive solutions also has an impact on the use of computational resources and, consequently, on sustainability indicators. This paper proposes an approach in line with the concept of Frugal AI, whose main aim is to minimize the resources and time spent on training models by re-using models from a pool of past models, when appropriate. Specifically, we present and validate a methodology for similarity-based model selection in data streaming environments with concept drift. Rather than training a new model for each new block of data, this methodology considers a pool with only a subset of the models and, for each new block of data, will select the best model from the pool. The best model is determined based on the distance between its training data and the current block of data. Distance is calculated based on a set of meta-features that characterizes the data, and on the Bray-Curtis distance. We show that it is possible to reuse previous models using this methodology, leading to potentially significant saving of resources and time, while maintaining predictive quality.

Conference: ISAmI – 15th International Symposium on Ambient Intelligence

Keywords: Machine Learning, Meta-Feature Extraction, Concept Drift, Model Selection, Frugal AI

Reusing Past Machine Learning Models Based on Data Similarity Metrics

Abstract

Related Posts

Efficient MLOps: Meta-learning Meets Frugal AI

Reusing ML Models in Dynamic Data Environments: Data Similarity-Based Approach for Efficient MLOps

RAGNAR: Retrieval-Augmented Generation using Networked and Advanced Relational Data