SSIS sort component is a first-class bottleneck as soon as the number of rows to sort needs a disk swap (which is very early). It is then very advisable to use the database engine to sort your data :
• Make a dataflow to insert data in a temporary table in the database
• Create an index on the rows to sort of the temporary table
• Read the data in an other dataflow with an « order by » clause, and force the properties « IsSorted » and « SortKeyPosition »
SSIS has a decent join component. Prior to using this component, the rows of both input flows must be sorted. You should sort these rows in previous dataflows (see the point above about sorts)
It is sometimes advisable to split a dataflow, in order to reduce the execution tree size (execution tree :Microsoft Technet : Performance Tuning Techniques). The more performant way to to this is to use row files (data source, and data destination as row file). These files have an SSIS binary format, optimized for this kind of usage, and keep all datatype and sorting meta-data.
See SSIS et les mises à jour de masse and SSIS et les UPSERTs
Lookups are a fundamental part of any ETL. SSIS allows a « memory restriction » parameter (in the advanced tab). It is very advisable to disallow « memory restriction », it will the load all data in memory, and perform the match with no roundtrip to the database. You should be carefull when writing the query, and avoid any « SELECT * ». If the dataset is to large to be kept in memory, you could check the « memory restriction », and create temporarily a covering index on the query before the dataflow (you drop the covering index after).
If you have several disks on your machine, check this dataflow property is located on a rapid disk.
This dataflow property is a hint for SSIS. Generaly speaking, this should have a value of twice the number of cores of your CPUs. If you have very few cores, consider splitting your dataflows to have small execution trees.
I usually use 15 000 000 on a 32 bit machine with 4GB.
The value of 10 000 is usualy a good measure.
If your SSIS package does a lot of update/insert : put your database in simple recovery mode before your package execution, and then back to complete recovery mode after.
Laisser un commentaire