Table of Contents

What is Stream Processing?

About

Stream processing is the reactive design of data processing.

Streaming Processing is also known as :

It's called stream processing because the events are based on a continuous arrival of data, logically name data streams.

Model

A stream processing flow is a combination of:

A pipeline is a query on a stream.

Pipeline

Streams are a flexible system to implement a pipeline of data transformations.

System Characteristics

It has a low memory and processing overhead.

This performance comes at a cost

Problems

A messaging system is a fairly low-level piece of infrastructure—it stores messages and waits for consumers to consume them. When you start writing code that produces or consumes messages, you quickly find that there are a lot of tricky problems that have to be solved in the processing layer.

Consider the counting example, above (count page views and update a dashboard).

Stream processing is a higher level of abstraction on top of messaging systems, and it’s meant to address precisely this category of problems.

See Samza is a stream processing framework that aims to help with these problems.

Implementation

As most of asynchronous system, stream processing is implemented with an event loop

Queue:

Routing:

Processor selection through:

Structure Flow:

Loop through an explicit processor (ie if and then go to step N)?

It is our firm belief that in the future, the majority of data applications, including real-time analytics, continuous analytics, and historical data processing, will treat data as what it really is: unbounded streams of events.

Stream vs Batch

See Stream vs Batch

Software

see Stream - (Software|Library) 1)