Incremental loading

Hi Friends please let me know how can we incrementally load a destination table with source table. bearing in mind that we need to track that there are no duplicates in the destination table. I need to load only changed or new data in the final load. Please give me some examples also. I am tryin this from last 2 days as I am totally new to SSIS.


Answer this question

Incremental loading

  • Eppz

    Sachin,

    I dount there is a link covering what Richard has described. Just follow his instructions and reply here if you get lost.

    You should use dtexec.exe to execute your packages outside of the design-time environment.

    -Jamie



  • Rabinarayan

    Sam_dia wrote:
    Do I need to execute dtexec.exe from command prompt

    Yes!



  • Mark Farmiloe

    Hi,

    First off: do you have a column on your source data that indicates that a row is new or changed

    If you do, then you can select from your source based on that column, using parameterised queries. This might be the case with, for example, an orders_table where the data on the row changes as the order is dealt with.

    If you don't have such a row, then the job is different, as you will have to pull the entire table, and match it against a stored copy in your destination to look for new or changed rows. This might be the case with a customers_table, where you want the destination table to reflect the latest status.

    Let me know either way, and I can post or send examples.

    Richard

    (note it's Friday evening UK time here, so I may not check this post for a while)


  • Scott Cameron

    Hi Richard,

    I need both the examples . I can have both the situations to deal with. so please assume that I have some column in source and give an example and later assuming I have no column to identify new/changed column,

    And thanks for the link varadaraj!

    Thanks



  • 2fastrunner

    I think SCD of type 1 will give the best result.
  • Alfred R. Baudisch

    Do I need to execute dtexec.exe from command prompt

  • TheShark

    Richard hit it right.

    If you need to compare source and target - you will need to query the target and see if such record already exists. I use Lookup transformation for this.

    Alternatives:

    - You can use Type 1 Slowly Chaging Dimentions transformation

    - Load source table in temp table and write SQL script to update existing and insert new

    Also, you may have deleted records in source that may not be needed in target anymore.


  • Guemundur

    Okey Richard I got the first one, but for the second condition ,if possible please pl give me some link to refer to.

    Also there is another doubt in my mind.. I will need to schedule my packages in the later course / call the package independently without opening the BIDS.So how can I do it.. if you have any example then it would be useful sir.

    Thx and Regards

    Sachin



  • Bondeson

    refer this document

    http://solidqualitylearning.com/blogs/erik/archive/2005/12/09/1499.aspx



  • ErniePalarca

    Hi,

    Sorry will be out of this loop for a while. My son is none too well, so I am having to look after him at home. He's not well enough to go to school, but is not poorly enough to just sleep the whole time - so I don't have much time to reply.

    If you schedule packages from SQL Server Agent, then there is a specific task type of SSIS packages, which also allows you to log the packages etc.

    As regards the second example I gave, it is just a slowly changing dimension, type 1 where you overwrite the data and type 2 where you append new or changed rows. If I get a chance I will copy and paste some code, but you should be able to find examples on the web somewhere.

    Regards,

    Richard


  • Dejvino

    Hi Sachin,

    OK, here goes:

    First problem - assuming there is dtLastChanged available

    1)Create a table in SQL which holds the name of the source table, and has a date column to store the date last changed for that table. Initially populate this with some early date. (1-jan-1970 for example)
    2) Create a package variable to hold the date. Create an Execute SQL task to read the date from the table, and put it into your parameter.
    3) On the OLEDB source create the query with a parameter placeholder for the datetime (eg WHERE dtLastChanged > )
    4) Bring in the new data, get the maximum value of dtLastChanged from the table and update your lookuptable.

    Details for the execute SQL task can be found here: http://www.sqlis.com/default.aspx 58

    If for some reason your OLEDB driver refuses to accept parameters (FoxPro for example!), then you'll have to create the entire SQL string as a variable, and pass that over.

    Second Problem - assuming you have NO suitable column in the source table and want to find new or changed rows.

    You can use an SSIS lookup transform, but if you have few changes and a huge source table this takes time, and uses up resources on the source system. In my experience that's generally undesirable as that system is running the company, taking lots of orders or whatever and it is better to hit that as lightly as possible. An alternative approach is therefore

    1) Pull the entire source table into SQL a table using SSIS. Actually pull only those columns you need, but pull every row.
    2) Compare this tbl_RAW data to a comparison table you have stored in SQL. I'll call this tbl_NKY (where NKY is short for Natural Key). The Natural Key columns are those column that you want to check to see if they have changed.
    3) Compare the tables in SQL, and update as appropriate.

    You then have a _NKY table in SQL which contains the latest version of the changes. If you need to track the changes over time (type 2 slow changes), then instead of updating the rows on this table, you can append new or changed rows, and mark them with the date range over which they are applicable.

    Hope this helps,

    If you want to e-mail me directly then remove the nospam from my e-mail address.

    Regards,

    Richard


  • Incremental loading