In order to design the data flow, you create graph with components (merger, joiner, filter, reader, and writer) into graphs. A graph is essentially a pipeline of components that processes the data.
The simplest graph has:
- one Reader component to read in the source data
- and one of the Endeca components to write (send) the data to an Endeca data store.
Transformation components are graphical objects that represent data processing steps. Graph is the formal term for the graphical layout that contains a set of transformation components.
An edge is the join line that connects two components by way of output and input data ports. Every component has one or more input ports, and one or more output ports. (The only exception is the Trash component, which has no output port.) Different components use ports differently.
Metadata describes the format of the data, and must be assigned to each edge. When connected with an edge, metadata automatically defines the output data format of one component and the input data format of the component it is connected to.
Integrator components process data in rows. By default, Integrator passes a row of data to the next component as soon as that single row has been processed. When a component has processed data, it will almost always send the data to one or more output ports.
Integrator is multi-threaded. In practical terms this means it tries to run components in parallel. As a row of data is processed, it is passed immediately to the next component. Integrator does not wait until all rows have been read in from the file or the previous component. This makes it very scalable across processor cores.
Integrator can also run phases, which allow you to sequence activities. You can specify that one phase must be completed before another can begin. This allows Integrator to make sure that certain tasks are fully complete before another task begins.