ADAPTIVE EDGE-CLOUD TASK OFFLOADING FOR LOW-LATENCY REAL-TIME AI INFERENCE

Abdullah Farhan; Esha Tariq

Authors

Abdullah Farhan Department of Computer Science, Institute of Edge Computing and Artificial Intelligence, Lahore, Pakistan Author
Esha Tariq Department of Software Engineering, Center for Cloud Computing and Intelligent Systems, Islamabad, Pakistan Author

Keywords:

Adaptive Task Offloading, Edge Computing, Cloud Computing, Real-Time Ai, Latency Reduction

Abstract

Real-time artificial intelligence applications require fast, reliable, and resource-efficient processing to support latency-sensitive tasks such as autonomous monitoring, smart surveillance, industrial automation, healthcare decision support, and intelligent transportation. However, executing all AI workloads on edge devices can be limited by computational capacity, memory availability, and energy consumption, while full cloud execution may introduce network delay and service-level violations. This paper investigates an adaptive task offloading approach for reducing latency in real-time AI applications by dynamically distributing inference tasks between edge and cloud environments. The proposed approach evaluates workload complexity, network condition, device utilization, and latency threshold before deciding whether a task should be processed locally at the edge or transferred to the cloud. The results show that adaptive offloading achieves lower average latency compared with edge-only, cloud-only, and static offloading strategies. The system also improves SLA compliance under high workload conditions while maintaining acceptable energy consumption and inference accuracy. Experimental analysis across multiple workload types demonstrates that adaptive task placement is particularly effective when network conditions fluctuate and task complexity varies. Overall, the findings suggest that intelligent edge-cloud coordination can significantly improve the performance of real-time AI systems by balancing speed, resource usage, and reliability.