1 - About

Nifi - NiFi was built to automate the flow of data between systems.

2 - Concept

2.1 - Data Flow

Nifi - DataFlow

2.2 - FlowFile

As data moves through NiFi, a pointer to the data (FlowFile attribute ?) is being passed around, referred to as a FlowFile.

FlowFiles are made up of two parts:

  • FlowFile Content
  • and FlowFile Attributes (They move from processor to processor in your dataflow, not the content)


  • FlowFiles Content is written to NiFI's content Repository while
  • FlowFile Attributes live in JVM heap memory and the NiFi FlowFile repository.

2.3 - Repository

There are three key repositories:

  • The FlowFile Repository (contains metadata for all the current FlowFiles in the flow)
  • the Content Repository (holds the content for current and past FlowFiles)
  • and the Provenance Repository (holds the history of FlowFiles, keep track of where in the flow the FlowFile is)

2.4 - Commit

As a Processor writes data to a flowfile, that is streamed directly to the content repository. When the processor finishes, it commits the session (essentially marks a transaction as complete). This triggers the Provenance Repository to be updated to include the events that occurred for that processor and then the FlowFile repository is then updated to keep track of where in the flow the FlowFile is.

Finally, the FlowFile can be moved to the next queue in the flow. This way, if power is lost at any point, NiFi is able to resume where it left off.

This, however, glosses over one detail, which is that by default when we update the repositories, we write the information to disk but this is often cached by the operating system. If you truly have a complete loss of power, it is possible to lose those updates to the repository. This can be avoided by configuring the repositories in the file to always sync to disk. This, however, can be a significant hinderance to performance.

Simply killing NiFi, though, will not be problematic, as the operating system will still be responsible for flushing that data to the disk.

2.5 - Monitoring

2.6 - Deployment

The cli tool enables administrators to interact with NiFi and NiFi Registry instances to automate tasks such as deploying versioned flows and managing process groups and cluster nodes. See

3 - Start

  • Wifi 1.9.2 to the local port 8081

docker run --name nifi ^
  -p 8081:8080 ^
  -d ^

4 - API

4.1 - Rest

5 - Documentation / Reference

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap