Copy data from one table to another

On the OLE DB Source, I have as following:

select field1, field2, field3 from table_source

On the OLE DB Destination, I used fast load option with table lock checked, and check constraints to copy data to table_destination. Both table_source and table_destination have the same table definitions. After the dtsx package ran, the number of rows copy over to the destination is not same. I got different results from different runs. This only happens when the source table is over 1 million rows and when I'm copying from the transaction database where user's activities are heavy.

Any ideas how I can modify the package so it can copy correct data Thanks!




Answer this question

Copy data from one table to another

  • sven22

    On the Control Flow, I've only one Data Flow Task. On the Data Flow, I've an OLE DB Source and an OLE DB Destination. There's no other task in the package.

  • dummy#1

    Tianyu, I'm sorry but I do not understand your comments. Can you give more details of how to accomplish that Thank you.

  • alek31415926

    Yes, the source is reading the correct number of rows but the destination is not inserting the right number. The source table is, let says, 1 million rows but the destination table is 1 million and 5 rows; I've checked the rows count before and after insert. Of these 5 extra rows, they're duplicated. Each execution returns different duplicated rows; sometimes 5, sometimes 10 but I've not find more than 20 duplicated rows yet.

    Yes, I always emty the destination table before insert; I even checked before insert and return a value to a temp table to make sure the table is empty.

    There's a primary key constraint on both source and destination tables. I've removed the constraint on the destination table to copy data all over. This is how I found the duplicated rows after the insert. If I keep the constraint, the package will fail because with fast load I cannot redirect the failed rows.

    It's very hard to debug this issue because the result is not consistent. Sometimes it copied correctly, sometimes not. Through SQL profiler, it took a while to see it finishes copying the data and I've not find anything that I think may cause the issue yet.

    Could this be SSIS bug



  • Harry He

    Tianyu, thank you for the instructions. As for option 1, I've already tried but still got the duplicated rows...same result as using my dtsx, sometimes it copied correctly and sometimes not. As for option 2, I already had a work-around; 1. Remove the constrainsts on destination table, allow it to copy all data over to destination, then have a script to look for the duplicated rows and remove one of the duplicate. or 2. Use the data load (no fast load) in the package, have the constraint on the destination table, and redirect the failed rows to the text file. I'm working on the data warehouse project, and using SSIS to ETL data, as I had found the issue but not sure of the root cause...just need to know if this is a bug in SQL or SSIS is the correct tool for data warehouse or not.



  • BAR

    Is the source reading the correct number of rows but the destination is not inserting the right number

    Is the destination table empty Is it possible that having constraint checking on is failing some rows

    I suggest you use SQL Profiler on the source and destination servers to see what SQL Server is actually doing.

    Donald



  • Quek

    BTW, I just upgrade server to SP1, and ran my package again. Out of 3 runs during business hours (update/insert/delete action excutes to the source table during the copying), 1 run has copied 2 duplicated rows and the table is about 240,000 rows. The other 2 runs are ok.

    I had this package on the SQL job agent and scheduled to run at night for a while, most of the time (about 95%) it copied data correctly;the other 5% it does not is usually due to we have other problems to the source server so the job delays until morning to run.



  • Nosy

    Is there anything else in the package, or is this just a straight source-destination data load

    Donald



  • Lantzmannen

    Donald,

    Here's the sample of my source table:

    Table_source

    -------------------

    Column_1 varchar(15) not null (unique key)

    Column_2 binary null

    Column_3 datetime not null (audit trigger in the source for row inserted date, default as getdate() for insert)

    Column_4 datetime not null (audit trigger in the source for row updated date, default as getdate() for insert/update)

    Table_source sample data

    Column_1 Column_2 Column_3 Column_4

    --------- ------------------ ---------------------- ------------------------

    123ABC 0x000000002A5270C3 2006-06-08 11:03:30.883 2006-06-08 11:03:30.883

    Table_destination has the same table definition as Table_source. As I used rows per batch and max insert commit size not equal to 0, and redirect rows to a text file. It ended up that the row, ex. with 123ABC, redirected to the text file and this row did not copied over to the table_destination. And ErrorCode=-1071607685, ErrorColumn=0. Btw, is there a way to translate what these ErrorCode and ErrorColumn mean All of the redirected rows in the text file have these values.

    If I remove the unique key constraint on the Table_destination, this is how it looks like:

    Column_1 Column_2 Column_3 Column_4

    --------- ------------------ ---------------------- ------------------------

    123ABC 0x000000002B36DA37 2006-06-07 11:03:30.883 2006-06-07 11:03:30.883

    123ABC 0x000000002A5270C3 2006-06-07 11:03:30.883 2006-06-08 11:11:57.153

    Table_destination will end up with two 123ABC rows; Column_2 will have different binary, and Column_4 will have different date time stamp (sometimes two rows will have exactly same date time stamp).

    After the data were copied over, I check the source table and it had a row like this:

    Column_1 Column_2 Column_3 Column_4

    --------- ------------------ ---------------------- ------------------------

    123ABC 0x000000002A5270C3 2006-06-07 11:03:30.883 2006-06-08 11:11:57.153

    There's no 123ABC row with time stamp of 2006-06-07 11:03:30.883. Look like this is the row from the source when package started, and other later time stamp is after the package ran; some update happened during the data extract.

    Any ideas what went wrong

    Thanks for previous suggestions.



  • Waters

    Keep me up to date, if this is a bug then it is a nasty one. I am working on data warehouse project too. Don't want uncleaned data into the DW.

    One more thing I would try to test is first load the data into Raw File Destination, then in another data flow to load to DW from raw file destination. This way we hold the dataset from transaction database for the shortest time. Seperate the dataset to smaller sets is also something else could try to work around the problem.

    This is an interesting topic. :)


  • BHGreene

    Have you tried to use SQL exporting option Try to use the SQL build option to transfer the data. Also during the export wizard, there is an option to save the package. You can compare your version and theirs to see if there are any difference.

    Another option, you might customize your package, that first query the last processed id on the destination and only grab the new data from the source. Since your data are coming in sequence order, you may not need to turn on transaction. If the task fails, you can restart from the last failed id.

  • Robert Caruso

    By the way, I got this error codes on the redirected rows before sp1 applied:

    ErrorCode=-1071607683, ErrorColumn=0

    And below is what after sp1 applied:

    ErrorCode=-1071607685, ErrorColumn=0

    My guess is that this is the violate primary key constraint, but don't know a way to translate this number.



  • DanDMan

    As option 1
    1. Open Microsoft SQL Server Manager Studio
    2. Drill down to the database you need to export
    3. Right mouse on the database and choose Export Data under tasks
    4. Select the data source and data destination
    5. Select Copy data from one or more tables or views, click next
    6. Map the tables and click Edit Mappings, customzie the options as you need
    7. Click Next, You will see "Save SSIS Package" option.
    We found out the build in SQL Exporting wizard is acturally creating SSIS packages and using them to do the data transfer.

    For option 2
    I guess the reason to enable transaction is to avoid failure half way through and lose track on the data.
    One way to "work around" the problem is use last procssed id.
    1. You can do a SELECT ISNULL(Max(IdentityID), 0) From DestinationTable and store as a user variable.
    2. Ole Db Source, you select statement would be something like SELECT identityID, Field1, field2, field3 From sourceTable WHERE IdentityID > Order by IdentityID ASC
    3. At Destination, you aslo store the IdentityID.
    At any point failed, you can always find the failing point and can restart the package once the source data is corrected, this way just avoid using transaction.

  • redcrusher

    >If I keep the constraint, the package will fail because with fast load I cannot redirect the failed rows.

    Here's something to try. You can redirect the failed rows if the MaxInsertCommitSize is not equal to 0. I suggest adding a batch size and setting MaxInsertCommitSize too. Then redirect the rows and let's see what happens.

    What is the primary key constraint you are using Is it possible at any point that you have nulls in constrained columns

    Donald



  • rocr

    ...correction to the first message:

    <<This only happens when the source table is over 1 million rows..>>

    The issue occurs regardless of the source table is over 1 million rows or not. I've a table of about 200 thousands rows and it copied incorrectly.



  • Copy data from one table to another