Data Processing - Reactive Stream Processing

1 - About

Reactive Streams is a standard/specification for:

It's also known as Observable's reactive dataflow because it is all about observing a stream (flow-controlled components) in which Publishers produce items consumed by one or more Subscribers, each managed by a Subscription.

This is a push based systems (observer)

3 - Definition

A reactive stream process is

  • a potentially unbounded number of elements (data whose volume is not predetermined )
  • in sequence,
  • asynchronously passing elements in a producer consumer model between:
    • the source of the data (producer, up)
    • to the destination for the data (consumer, down)
    • through one or more processing stages
  • with mandatory non-blocking backpressure (backpressure is an integral part of the model)

The fact that the downstream can be slower than the upstream is taken in the domain model. You can be disconnected if consuming too slow. Examples Twitter streaming API

It's also called reactive functional programming.

4 - Model

4.1 - Interface

The model has 4 interfaces.

4.1.1 - Publisher

A Publisher is a provider of a potentially unbounded number of sequenced elements, publishing them according to the demand received from its Subscriber(s).

A Publisher can serve multiple Subscriber's subscribed dynamically at various points in time.


public interface Publisher<T> {
    public void subscribe(Subscriber<? super T> s);
}

4.1.2 - Subscriber

A Subscriber - A receiver of messages - arranges that items be requested and processed.

It

  • connects to a Publisher,
  • receives a confirmation via onSubscribe,
  • then receive data via the onNext callbacks and additional signals via onError and onComplete,

public interface Subscriber<T> {
    public void onSubscribe(Subscription s);
    public void onNext(T t);
    public void onError(Throwable t);
    public void onComplete();
}

4.1.3 - Subscription

A Subscription represents a link between a Publisher and a Subscriber. It is used to:

  • signal desire for data
  • cancel demand (and allow resource cleanup).

public interface Subscription {
    public void request(long n);
    public void cancel();
}

4.1.4 - Processor

A Processor combines the capabilities of a Publisher and a Subscriber in a single interface.


public interface Processor<T, R> extends Subscriber<T>, Publisher<R> {
}

4.2 - Lifeycle on the Subscriber methods

In response to a call to Publisher.subscribe(Subscriber) the possible invocation sequences for methods on the Subscriber are given by the following protocol:


onSubscribe onNext* (onError | onComplete)?

ie:

  • onSubscribe is always executed (signalled)
  • followed by a possibly unbounded number of onNext signals (as requested by Subscriber)
  • followed by an onError signal if there is a failure, or an onComplete signal when no more elements are available
  • all as long as the Subscription is not cancelled.

4.3 - Asynchronous

This section will show different asynchronous scenarios that are all valid asynchronous streams. They all have their place and each has different tradeoffs including performance and implementation complexity.

Example is made with the following flow


nioSelectorThreadOrigin map(f) filter(p) consumeTo(toNioSelectorOutput)

where:

On this flow, the Subscription.request(n) create a unit of process that must be chained from the destination to the origin. Each implementation can choose to implement it with different scenario

The following scenarios uses:

  • the pipe | character to signal async boundaries (queue and schedule)
  • and R# to represent resources (possibly threads).

4.3.1 - Scenario 1


nioSelectorThreadOrigin | map(f) | filter(p) | consumeTo(toNioSelectorOutput)
-------------- R1 ----  | - R2 - | -- R3 --- | ---------- R4 ----------------

In this example each of the 3 consumers, map, filter and consumeTo asynchronously schedule the work. It could be on the same event loop (trampoline), separate threads, whatever.

4.3.2 - Scenario 2


nioSelectorThreadOrigin map(f) filter(p) | consumeTo(toNioSelectorOutput)
------------------- R1 ----------------- | ---------- R2 ----------------

  • Here it is only the final step that asynchronously schedules, by adding work to the NioSelectorOutput event loop.
  • The map and filter steps are synchronously performed on the origin thread.

4.3.3 - Scenario 3

Or another implementation could fuse the operations to the final consumer:


nioSelectorThreadOrigin | map(f) filter(p) consumeTo(toNioSelectorOutput)
--------- R1 ---------- | ------------------ R2 -------------------------

5 - Language

5.1 - Java

Reactive Streams Specification for the JVM define the basic semantics:

  • how the transmission of stream elements is regulated through back-pressure.
  • How elements are transmitted, their representation during transfer,
  • How back-pressure is signaled is not part of this specification.

Reactive Streams includes:

Starting from Java 9, they have become a part of the JDK in the form of the java.util.concurrent.Flow interfaces or you can get them as java dependencies with the following maven coordinates

  • org.reactivestreams:reactive-streams:1.0.3
  • org.reactivestreams:reactive-streams-tck:1.0.3
  • org.reactivestreams:reactive-streams-tck-flow:1.0.3
  • org.reactivestreams:reactive-streams-examples:1.0.3

5.2 - Javascript

6 - Implementation

7 - Documentation / Reference


Data Science
Data Analysis
Statistics
Data Science
Linear Algebra Mathematics
Trigonometry

Powered by ComboStrap