Intelligent Optimization of Distributed Pipeline Execution in Serverless Platforms: A Predictive Model Approach
Efficient execution of distributed pipelines in serverless environments is a key challenge to reduce both time and operational costs in the cloud. This paper presents an approach to predict and optimize the duration of a serverless pipeline, using a geospatial water consumption analysis pipeline as a case study, which is executed and parallelized with Lithops. The hyperparameters of the XGBoost model were optimized using Optuna, resulting in a 75.34% reduction in Mean Absolute Error (MAE) compared to a baseline model, and a 79.9% reduction in execution time compared to suboptimal configurations. Additionally, the model reduced the number of necessary pipeline executions by 30% compared to a full Design Space Analysis (DSA), leading to a 30% cost savings. These results highlight the model’s ability to significantly improve both execution efficiency and cost-effectiveness.