Hi!
I'm working on a project restructuring some data using SQLIS. The problem is one package that is handling about 1.2 million rows takes about 7-8 hours to complete.
To my question, is there any of the dataflow components that are know to be slow that you should avoid regarding performance issues
The following list is a brief specification over the component that are contained in the package:
-------------------------------------------------------------------------------------
3 OLEDB Data sources.
2 OLEDB Data sources.
3 Derivied column objects.
1 Union which unions all rows from the sources above.
2 look up components that look up rows for the all rows after the union component.
1 script component calulating a new surrugate key for all rows (sk=sk+1)
1 Conditional split splitting up all rows into 2 separate flows
For each of these separate flows there is a 2 look up components.
-------------------------------------------------------------------------------------
:)
I don't know if the above was much of help, but I hope it will give you a brief overview, there are no complicated script components involved.
The server running the package does nothing else but executing the package on a dual core Pentium 4 at 3.2 GHz and has 2GB of RAM.
-------------------------------------------------------------------------------------
When the package has started it seems to be IDLE for 7 hours, taking almot 0% of the processortime... :S
Anyone who has had a similar problem

Performance, handling 1.2 million rows takes 7-8 hours?
Ifgash
I used a script component to transform the error codes into something more meaningful and then dumped the last ditch errors (after 2nd ole db adabter to a local variable object dataset).
(thanks Jamie for your blog post a few years ago.)
ode Snippet
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
'Use the incoming error number as a parameter to GetErrorDescription
Row.MeaningfulDescription = ComponentMetaData.GetErrorDescription(Row.DestTableErrorCode)
End Sub
Somehting I learned on the Lookup tables: Make sure the lookup keys are indexed so the SQL query is faster. Using SQL trace you will see a call for each lookup until they are all catched in memory (use full catch to handle null values).
I have to figure out the whole SQL transaction logs bit. I set it to simple so far, with 10% growth, and limited to 4096 (4 gigs).
Go the package Default Buffer size at 104857600, which I think is 100mb (100 x 1024 x 1024 bytes/megabyte). Still in dev, but My machine bogs down after all the memory consumption.
Vb.net user
I have no general solution but have now come optimized the model which now takes about 3 minutes to execute, with a larger dataset containing about 6M rows.
-----------------------------------------------------------------------------------------------------
1)
Took all lookups from large tables and used "SQL JOINS" instead.
2)
Replaced "OLEDB-destination" in favor for "SQL-destination", this made a significant speed up. This is one point where I could see that a big difference, when I was using the "generic" OLEDB driver the model stored 10000 in chunks which took alot of time. After changing to SQL-destinations the rows seemed to be stored much faster in a more seamless fashion. Maybe seem lika a obvious thing, but it didn't know there were such a difference, I liked the OLEDB-Destinations because of the error handling capabilities.
3)
Changed my restriction to the db & transaction log - files, to: no restriction in size and file growth by 10% (previously 1MB). This may have speeded up the work a bit for SQL Server, cause it would not have to allocate new space all the time.
4)
The above changes made the model execute really fast when it was split up. After merging the separate models, running the "hole" model still "hang" and used almost no CPU. The solution in this case was to use more engine threads from default 5, I increased the number of threads to 10.
-----------------------------------------------------------------------------------------------------
Sorry for the specific solution, but I hope it may help someone that have been stuck in some similar scenario, to try out some of the above "tricks" to solve hers/his problem.
Best luck!
iemad
I'll echo what Jamie said - the Lookups are the only thing that could take time. Check:
* If the Lookup CacheType = Full or Partial, are you out of memory and swapping to disk (doesn't seem likely)
* If CacheType = Partial or None, you're doing a round trip to the db for every single record (None) or for every single record missiing the Lookup cace (Partial), this seems a likely reason for your issue.
Good luck/Kristian
Jason Smith
I would suggest that the idle time is spent populating the LOOKUP caches. Are you using full caching
If those 7 hours are spent filling caches then you will see information in the Output window informing you of this.
It seems you have ALOT of LOOKUP components here. My first advice would be to split the LOOKUPs into seperate data-flows - using raw files to pass data between them.
I'm interested to know hwo you get on so reply here and let us know!
-Jamie
JT673
amphi
Hello guys,
Thanks for the replies I'm using full cache on all lookup components.
Do I have to set the following options in the Advanced-tab:
"Enable memory restrictions" and
"Enable caching"
to true, in order to get the cache activated
Global Lens
yes but I've split the task now and only have 3... to debug alittle to see if it went faster, but it seems that I get the same result, the CPU goes idle...
why I want to have all components in the same task is that I'm constructing a parent\child structure and then it was the most straight forward solution to have all components in the same task, coz I need the surrugate key that is generated for parent to populate all children rows...
I'm merging alot of different tables into a more general model, and have to do alot of lookups to find all information... Maybe it's more efficient to do JOINS in the datasource instead od tables/lookups...
C# Lover
No. They are additional tuning params. I think you can leave them for the time being.
As I understand it you have something like 5 LOOKUPs in the same data-flow. With 1.2 million rows of data this is very likely to cause problems
-Jamie
Ramesh Narayanan
If you CAN do joins in the source components then that is where you should do them. Use the power of the DBMS before you have to employ SSIS.
You still haven't found out which component is causing the problems. Break your data-flow into as many speerate parts as possible and link them with data-flows. Assuming this works OK start to put everything back into the data-flow - and stop when the problems re-occur. That will show you where the problems occur.
-Jamie