A data flow transfers data from a source to a destination.
Each step in the Data Flow task operates in sequence on a rowset of data as it passes through the pipeline. The data flow engine uses buffers (???) to optimize the rate of flow for the data that is passing through the pipeline.
A source/destination definition includes:
For more info: See Integration Services Transformations.
If subsequent transformations in your data flow rely on sorted data, use the IsSorted property of the output to indicate that the data is already sorted.
Configuring the size of the buffers that the data flow uses can significantly improve performance. When there is sufficient memory available, you should try to achieve a small number of large buffers without incurring any disk paging. The default values for these properties are 10 MB and 10,000 rows respectively.
Using these properties to locate temporary objects created by the data flow to a fast disk, or spreading them across multiple storage devices, can improve performance.
Setting the number of threads available to the Data Flow task can improve execution performance, particularly in packages where the MaxConcurrentExecutables property has been set to enable parallel execution of the package’s tasks across multiple processors.
Setting a Data Flow task to run in optimized mode increases performance by removing any columns or components that are not required further downstream in the data flow.