Top 10 SSIS performance advices

  • Sorts
  • SSIS sort component is a first-class bottleneck as soon as the number of rows to sort needs a disk swap (which is very early). It is then very advisable to use the database engine to sort your data :
    • Make a dataflow to insert data in a temporary table in the database
    • Create an index on the rows to sort of the temporary table
    • Read the data in an other dataflow with an « order by » clause, and force the properties « IsSorted » and « SortKeyPosition »

  • Joins
  • SSIS has a decent join component. Prior to using this component, the rows of both input flows must be sorted. You should sort these rows in previous dataflows (see the point above about sorts)

  • Transfering data between dataflows
  • It is sometimes advisable to split a dataflow, in order to reduce the execution tree size (execution tree :Microsoft Technet : Performance Tuning Techniques). The more performant way to to this is to use row files (data source, and data destination as row file). These files have an SSIS binary format, optimized for this kind of usage, and keep all datatype and sorting meta-data.
     

  • upsert and mass-update
  • See SSIS et les mises à jour de masse and SSIS et les UPSERTs

  • Lookups
  • Lookups are a fundamental part of any ETL. SSIS allows a « memory restriction » parameter (in the advanced tab). It is very advisable to disallow « memory restriction », it will the load all data in memory, and perform the match with no roundtrip to the database. You should be carefull when writing the query, and avoid any « SELECT * ». If the dataset is to large to be kept in memory, you could check the « memory restriction », and create temporarily a covering index on the query before the dataflow (you drop the covering index after).

  • BufferTempStoragePath
  • If you have several disks on your machine, check this dataflow property is located on a rapid disk.

  • Enginethreads
  • This dataflow property is a hint for SSIS. Generaly speaking, this should have a value of twice the number of cores of your CPUs. If you have very few cores, consider splitting your dataflows to have small execution trees.

  • DefaultBufferSize
  • I usually use 15 000 000 on a 32 bit machine with 4GB.

  • DefaulBufferMaxRows
  • The value of 10 000 is usualy a good measure.

  • simple recovery mode
  • If your SSIS package does a lot of update/insert : put your database in simple recovery mode before your package execution, and then back to complete recovery mode after.

    Hijacking GWT-RPC serialization

    GWT, the Google Web Toolkit for client-side web programming, contains several client/server communication frameworks. The « traditional » one is called GWT-RPC. GWT-RPC encapsulates the serialization/deserialization of objects, and all the protocol plumbing.

    This is usually fine for most use cases, but I needed to explicitely serialize objects on the client side, and deserialize them on the server side.

    Use case : I need to call a servlet from my GWT application, to open a document (generated server-side) in a new windows – using window.open()

    Solution : create a standard GWT-RPC, declaring a « fake » method containing all the parameter types that we need to explicitly serialize. This will generate a correct GWT « serialization policy ». Then, server side, implement the GWT Service servlet, and override the doGet with your code.

    Create the standard GWT-RPC interfaces :

    The interface declares a fake method, with the only goal to allow the type we want to add in the serialization process (here, we have OutputTypeEnum, RepositoryServiceEnum and FBBaseFilterPagingLoadConfig).

    @RemoteServiceRelativePath("../fastbiz/service/recordUpload")
    public interface RecordUploadService extends RemoteService {
     // Fake
     void fakeModel(OutputTypeEnum outputType, RepositoryServiceEnum repository, FBBaseFilterPagingLoadConfig config) throws FBException;
    }

     
    public interface RecordUploadServiceAsync {
     // Fake
     void fakeModel(OutputTypeEnum outputType,RepositoryServiceEnum repository,FBBaseFilterPagingLoadConfig config, AsyncCallback<Void> callback);
    }

    • Client side :

    // Creating factory. Interface of any remote service can be used
    SerializationStreamFactory factory = (SerializationStreamFactory) GWT.create(RecordUploadService.class);

    // Creating stream writer
    SerializationStreamWriter writer = factory.createStreamWriter();

    // Serializing object
    String serializedOutputType = null;
    String serializedRepositoryIdentifier = null;
    try {
     writer.writeObject(outputType);
     // Getting serialized object content
     serializedOutputType = URL.encodeQueryString(writer.toString());

     writer = factory.createStreamWriter();
     writer.writeObject(listWindow.getRepositoryIdentifier());

     // Getting serialized object content
     serializedRepositoryIdentifier = URL.encodeQueryString(writer.toString());
     
     writer = factory.createStreamWriter();
     PagingLoadConfig config = listWindow.getConfig();
     writer.writeObject(config);
    } catch (SerializationException e) {
     e.printStackTrace();
    }

    // Getting serialized object content
    String serializedConfig = URL.encodeQueryString(writer.toString());
     
    Window.open(Upload.EXPORT_BASE_URL + "?" + Upload.PARAM_OUTPUT_TYPE + "="
     + serializedOutputType + "&"
     + Upload.PARAM_REPOSITORY_IDENTIFIER + "="
     + serializedRepositoryIdentifier + "&"
     + Upload.PARAM_PAGING_LOAD_CONFIG + "="
     + serializedConfig, "_blank", null);

     

    • Server side :

    You implement the service interface, and override the GET method that way:

     

    public class RecordUploadServiceImpl extends RemoteServiceServlet implements com.fastbiz.client.service.RecordUploadService {
     
     private static final long serialVersionUID = 9107309941962048452L;
     
     @Override
     protected void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
     
     // Ensure the thread-local data fields have been initialized
     
     try {
      // Store the request & response objects in thread-local storage.
      synchronized (this) {
       if (perThreadRequest == null) {
        perThreadRequest = new ThreadLocal<HttpServletRequest>();
       }
       if (perThreadResponse == null) {
        perThreadResponse = new ThreadLocal<HttpServletResponse>();
       }
       perThreadRequest.set(request);
       perThreadResponse.set(response);
      }
      processGet(request, response);
     
     } catch (Throwable e) {
      // Give a subclass a chance to either handle the exception or
      // rethrow it
      doUnexpectedFailure(e);
     } finally {
      // null the thread-locals to avoid holding request/response
      perThreadRequest.set(null);
      perThreadResponse.set(null);
     }
     
     }
     
     protected void processGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
      if (Upload.TYPE_LIST.equals(origin)) {
     
      // Initializing stream reader
      ServerSerializationStreamReader streamReader = new ServerSerializationStreamReader(
      Thread.currentThread().getContextClassLoader(), this);
      RepositoryServiceEnum recordType;
      FilterPagingLoadConfig config;
      OutputTypeEnum outputType;
      try {
       // Filling stream reader with data
       streamReader.prepareToRead(request.getParameter(Upload.PARAM_OUTPUT_TYPE));
       // Reading deserialized object from the stream
       outputType = (OutputTypeEnum) streamReader.readObject();
      } catch (SerializationException e1) {
       throw new ServletException("Unable to deserialize parameters",
       new FBException(e1));
      }
      try {
       // Filling stream reader with data
       streamReader.prepareToRead(request.getParameter(Upload.PARAM_REPOSITORY_IDENTIFIER));
       // Reading deserialized object from the stream
       recordType = (RepositoryServiceEnum) streamReader.readObject();
      } catch (SerializationException e1) {
       throw new ServletException("Unable to deserialize parameters",
       new FBException(e1));
      }
      try {
       // Filling stream reader with data
       streamReader.prepareToRead(request.getParameter(Upload.PARAM_PAGING_LOAD_CONFIG));
       // Reading deserialized object from the stream
       config = (FilterPagingLoadConfig) streamReader.readObject();
      } catch (SerializationException e1) {
       throw new ServletException("Unable to deserialize parameters",
       new FBException(e1));
      }
     
     }
     
    }
     
     @Override
     public void fakeBaseModel(OutputTypeEnum outputType, RepositoryServiceEnum repository, FBBaseFilterPagingLoadConfig config) throws FBException {
     }
     
    }