Advancing Serverless Computing for Scalable AI Model Inference: Challenges and Opportunities
Artificial Intelligence (AI) model inference has emerged as a crucial component across numerous applications. Serverless computing, known for its scalability, flexibility, and cost-efficiency, is an ideal paradigm for executing AI model inference tasks. This survey provides a comprehensive review of recent research on AI model inference systems in serverless environments, focusing on studies published since 2019. We investigate system-level advancements aimed at optimizing performance and cost-efficiency through a range of innovative techniques. By analyzing high-impact papers from leading venues in AI model inference and serverless computing, we highlight key breakthroughs and solutions. This survey serves as a valuable resource for both practitioners and academic researchers, offering critical insights into the current state and future trends in integrating AI model inference with serverless architectures. To the best of our knowledge, this is the first survey that includes Large Language Models (LLMs) inference in the context of serverless computing.