Dhruv Toshniwal

Operator Placement on Edge Systems in Stream Processing

July 2023

Operator Placement on Edge Systems in Stream Processing

Introduction

Stream processing systems face challenges with limited resources and strict latency requirements, especially when ingesting data from multiple sources across different locations with limited bandwidth. This project proposes a heterogeneity-aware operator placement algorithm for stream processing systems that offloads tasks to edge systems, specifically Raspberry Pi devices, to optimize resource utilization and minimize latency overhead.

Problem Statement

Traditional stream processing systems like Flink are designed for homogeneous data center servers, making them unsuitable for automatically offloading tasks to edge systems. This results in high latency and inefficient resource utilization when streaming applications process data across different locations with limited network capabilities.

Proposed Solution

The solution involves:

  • Preprocessing data streams at the edge to reduce data traffic and latency
  • Identifying tasks that can be offloaded using performance metrics
  • Implementing a dynamic mechanism to predict data stream flow
  • Extracting performance metrics like:
    • backPressureTimeMsPerSecond
    • idleTimeMsPerSecond
    • busyTimeMsPerSecond
    • numRecordsOutPerSecond

The goal is to modify the Flink scheduler to intelligently offload tasks to edge systems.

Expectations

The project aims to:

  • Create a prototype that offloads tasks to edge systems
  • Reduce data traffic and latency overhead
  • Efficiently utilize edge resources
  • Minimize server-side resource usage
  • Improve system efficiency without sacrificing performance

Experimental Plan

Experimental setup will include:

  • Cloud/local server
  • Raspberry Pi 4B
  • Apache Flink
  • Python and Java

The approach involves:

  1. Initially running tasks on the server side
  2. Collecting performance metrics from Flink
  3. Placing lightweight operators on available edge slots
  4. Testing on open-source datasets

Success Indicators

The project will be considered successful if it can:

  • Make Flink compatible with heterogeneous resources
  • Implement a cost model for operator offloading
  • Dynamically change placement configuration based on cost model output

Conclusion

The proposed heterogeneity-aware operator placement algorithm represents a significant step toward more efficient stream processing in edge computing environments, promising improved performance and resource utilization across distributed systems.