Nifi - NiFi was built to automate the flow of data between systems.
As data moves through NiFi, a pointer to the data (FlowFile attribute ?) is being passed around, referred to as a FlowFile.
FlowFiles are made up of two parts:
Storage:
There are three key repositories:
https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html#repositories
As a Processor writes data to a flowfile, that is streamed directly to the content repository. When the processor finishes, it commits the session (essentially marks a transaction as complete). This triggers the Provenance Repository to be updated to include the events that occurred for that processor and then the FlowFile repository is then updated to keep track of where in the flow the FlowFile is.
Finally, the FlowFile can be moved to the next queue in the flow. This way, if power is lost at any point, NiFi is able to resume where it left off.
This, however, glosses over one detail, which is that by default when we update the repositories, we write the information to disk but this is often cached by the operating system. If you truly have a complete loss of power, it is possible to lose those updates to the repository. This can be avoided by configuring the repositories in the nifi.properties file to always sync to disk. This, however, can be a significant hinderance to performance.
Simply killing NiFi, though, will not be problematic, as the operating system will still be responsible for flushing that data to the disk.
The cli tool enables administrators to interact with NiFi and NiFi Registry instances to automate tasks such as deploying versioned flows and managing process groups and cluster nodes. See https://nifi.apache.org/docs/nifi-docs/html/toolkit-guide.html
docker run --name nifi ^
-p 8081:8080 ^
-d ^
apache/nifi:1.9.2