Table of Contents

Stream - Samza

About

LinkedIn stream processing framework that provides powerful, reliable tools for working with data in Kafka.

(LinkedIn created Apache Kafka to be the data exchange backbone of its organisation.) See StreamTask

Samza works on a Samza that is comprises of three different systems:

Stream Definition

A stream in Samza is:

sequence of messages.

Streams are not just inputs and outputs to the system, but also buffers. The input to the next processing stage is simply the files produced by the earlier stage.

This is the same model than in Hadoop where the processing stages are MapReduce jobs, and the output of a processing stage is a directory of files on HDFS.

Benefits

The benefit of this model are:

API

High Level Api

Application

An application written using Samza’s High Level Api implements the StreamApplication interface.

The interface provides a single method named describe(), which allows us to define our inputs, the processing logic and outputs for our application.

StreamApplication { 
  describe(StreamApplicationDescriptor);
}