Reusing Past Machine Learning Models Based on Data Similarity Metrics
Abstract
Many of today’s domains of application of Machine Learning (ML) are dynamic in the sense that data and their patterns change over time. This has a significant impact in the ML lifecycle and operations, requiring frequent model (re-)training, or other strategies to deal with outdated models and data. This need for dynamic and responsive solutions also has an impact on the use of computational resources and, consequently, on sustainability indicators. This paper proposes an approach in line with the concept of Frugal AI, whose main aim is to minimize the resources and time spent on training models by re-using models from a pool of past models, when appropriate. Specifically, we present and validate a methodology for similarity-based model selection in data streaming environments with concept drift. Rather than training a new model for each new block of data, this methodology considers a pool with only a subset of the models and, for each new block of data, will select the best model from the pool. The best model is determined based on the distance between its training data and the current block of data. Distance is calculated based on a set of meta-features that characterizes the data, and on the Bray-Curtis distance. We show that it is possible to reuse previous models using this methodology, leading to potentially significant saving of resources and time, while maintaining predictive quality.
Conference: ISAmI – 15th International Symposium on Ambient Intelligence
Keywords: Machine Learning, Meta-Feature Extraction, Concept Drift, Model Selection, Frugal AI