Data Flow - Backpressure

About

When the dataflow runs through asynchronous steps, each step may perform different things with different speed. In this setting, Back-pressure means a fast producer and slow consumer.

If the producer and the consumer writes/read on disk, there is no backpressure problem

The issue of back-pressure comes when your consumer is not capable of processing income data in the same rate.

A streaming software will be implemented in this case in order to:

  • play the role of buffers
  • and having a sort of checkpoint or offset mechanism in order to be able to restart the process if needed.

The longer a consumer stay offline the higher will be the back-pressure.

To avoid overwhelming such steps, which usually would manifest itself as increased memory usage due to temporary buffering or the need for skipping/dropping data, so-called backpressure is applied,

Backpressure:

  • is a form of flow control where the steps can express how many items are they ready to process.
  • is when the slower consumer request the amount of data that it can process from the faster producer

This allows constraining the memory usage of the dataflows in situations where there is generally no way for a step to know how many items the upstream will send to it.

Documentation / Reference


Powered by ComboStrap