What is a Stream? Also known as Pipe, Message Queue or Event Processing

Event Centric Thinking

About

A stream is:

An finite sequence is called a list

Example

Streams of data

  • user activity on a website
  • sensor readings from devices (IOT)
  • order delivery
  • A table is a stream of data manipulation with an infinite windows that you will find persisted in a write-ahead log

Quotes

The world is concurrent. Things in the world don’t share data. Things communicate with messages. Things fail.

A stream is derivative of state over time. The product rule, (uv)' = u'v + uv', is analogous to the rule for joining streams.

Operations / Pipeline

Functional-style operations on streams of elements on collections, such as map-reduce transformations.

Collections are primarily concerned with the efficient management of, and access to, their elements. By contrast, streams do not provide a means to directly access or manipulate their elements, and are instead concerned with declaratively describing their source and the computational operations that will be performed in aggregate on that source.

To perform a computation, stream operations are composed into a stream pipeline. A stream pipeline can be viewed as a query on the stream source.

Realtime

Because stream processing is also infinite, streams are associated to realtime processing.

Algorithm

All data processing algorithm cannot rely on the size to make assumptions.

System

The system that manages a stream is called messaging system.

Why? Because it's an application that handles / passes a message.

Immutable State

Stream processing lets model systems that have state without ever using assignment or mutable data.

Data Structure

The data structures involved in stream application are:

Process

Event sourcing describes a process as a sequence of event.

Streaming concepts

  • characteristics of unbounded streams,
  • time,
  • and state

Architecture

In a stream architecture, stream processing is using the observer operator:

  • Something happened (A new element in the stream such as an Event),
  • Subscribe to it (Streams)

A messaging technology needs to have the following characteristics:

  • Replayable
  • Persistent
  • Capable of high performance at large scale

Vision

Real-time Mapreduce Event-driven microservices
Storm, Spark Streaming, Flink Kafka Stream API
Central cluster Embedded library in any Java app
Custom packaging, deployment & monitoring Just Kafka and your app
Suitable for analytics-type use cases Makes stream processing accessible to any use case

Event Centric

Event Centric Thinking

Documentation / Reference





Discover More
Card Puncher Data Processing
Apache Beam (Batch and Stream processing)

Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline....
Card Puncher Data Processing
Aws - Kinesis Streaming Data Platform

Stream processing in AWS is regrouped under the Kinesis streaming data platform which is composed of: Kinesis Data Firehose Kinesis Data Streams, Kinesis Video Streams, and Amazon Kinesis Data...
Data System Architecture
Collection - Sequence (Ordered)

A sequence is an abstract collection: of non-unique element where order matters - ie (A,B) is not equal to (B,A) non-unique means that it allows duplicate member (the same element/value may occur...
Card Puncher Data Processing
Data Processing - Reactive Stream Processing

Reactive Streams is the standard/specification for reactive data processing (ie observer, asynchronous processing) The characteristics are: functional programming fashion non-blocking backpressure...
Batch
Data Processing - Batch

The batch semantics means grouping data into batch in order to manipulate a lot of data at once as opposed to read and process each unit of data. The data is stored in a container: for disk...
Card Puncher Data Processing
Design - CQRS (Command Query Responsibility Segregation)

CQRS: Write operation are not done in the same persistence storage than on the read operation. When users interact with the information they use various presentations of this information. Splitting the...
Card Puncher Data Processing
Design Pattern - (Iterator|Cursor)

An iterator is an interface that can express sequences of unlimited size, such as the range of integers between 0 and Infinity. It allow a user to loop over every element of a collection (container) while...
Card Puncher Data Processing
Event-Data Application

are event-driven application that reports / analyze the immutable event collected (without any notion of a pre-defined lifecycle). An event-driven application is a stateful application that: ingest...
Card Puncher Data Processing
I/O - (Input/Output|Read/Write) - Data Access

I/O devices can be interpreted as streams, as they produce or consume potentially unlimited data over time. IO = Input / Output = Writing and Reading data. It's an umbrella term that regroups IO transfer...
Io Input Stream
I/O - Stream

A stream concept at the io level is a file (generally a text file) A stream is an abstract concept for files and io devices which can be read or written, or sometimes both. I/O devices can be interpreted...



Share this page:
Follow us:
Task Runner