Hi,
I want to insert data in a database. How is it possible to avoid inserting duplicate data If I want to insert a row which also is in my database, I get an Exception and my program stops running. How can I tell my programm to go on without stoppng even if there was an Exception
Or has anybody a better and faster solution !
Thank you!

Avoid duplicate data
antonio97b
How is your data stored before going into the database Are you loading from a flat file or is the data in memory in some sort of array Either way, here are two options:
1. Pre-sort (appropriate for all in memory collections and smaller files, e.g. < 50MB)
- Load your data into an in-memory data structure (e.g. an Array)
- Sort on the primary key columns using the .Sort method
- Loop through each record
- If current row's primary key = previous row's primary key skip
- Otherwise, INSERT
2. SELECT before INSERT
- SELECT primary key columns FROM table to INSERT (if you only select the primary key columns, this should be VERY fast since it's not even hitting the table, just the index)
- IF SELECT returns 0 rows, then INSERT
Hope this helps, as before, let me know if you need sample code (some of yours would help me provide direction as well),
Josh Lindenmuth
JayKay
If you have an example - please let me know :)
sandPR
given Table1(key int, data1 varchar(50), data2 datetime) then:
Create proc Process1(@key int, @data1 varchar(50), @data2 DateTime, @DoUpdate bit)
as
if @DoUpdate = 1
Update Table1 set data1 = @data1, data2 = @data2
where key=@key
else
Insert into Table1(key, data1, data2)
values(@key, @data1, @data2)
2. Before you process a table, drop all non-integrity based indexes.
Gain comes from not having to recalculate the index on data change.
3. For each table create a SQLCommand, set the CommandText to the stored proc and set the commandtype to storedproc. For each line of input data, set the parameters first setting @DoUpdate = true, executenonquery, if the return is 0, set parameter @DoUpdate = false and execute again.
Gain comes from not having to recreate a SQLCommand for every line
4. recreate you indexes that were dropped in step 2.
Depending on your input data, I bet you see a 75% increase in performance, I.E. 5000 rows in 15 seconds
let me know how it goes!
MARL
Hi Josh,
this (<- see quoted area) is what I did before ignoring the exceptions. I do a Select and if my result is 0, I insert. This takes a long time and so I searched for other methods.
To give you a short overview, here are some informations:
- First I read some (= 10 to 20) .txt files where my data is store.
- Then I want to insert the data in a database.
- Because the data in the textfiles CAN have duplicates, I need to know if I have inserted them before because of my primary key logic in the database. I have no autoincrement key (no int or something else), my key belongs to three columns (type date,time and varchar).
- Every .txt file contains nearly 500 rows of data...so I have (10 to 20 * 500 rows) in my database.
... ok, I think this is all of information... I will try your idea with an ArrayList and sort it and will see how long it takes.
If you have other good ideas, I would be happy about to know them.
Thanks for your help, Josh
...and sorry for my English...
jwdenny
I didn't realize your "slow" process was with SELECTs instead of catching exceptions. Here's how you'd write a function that would ignore exceptions when trying to INSERT. You'd call this function for each row in your database.
public void InsertNoFail(sql as String)
{
try
{
//INSERT logic
}
catch(Exception e)
{
//do nothing (essentially ignores the exception)
}
}
The above could be made faster by using parameters instead of sending 1000's of INSERT statements to the database. Also, you'd probably want to replace catch(Exception) with the specific error your database is throwing when there's a duplicate key.
Hope this helps,
Josh
Vaibhav_Patel
Your best bet for ignoring those exceptions and continueing is to create a function that wraps your insert in a try...catch block. You'll need to catch the specific duplicate key exception that is thrown, and just ignore the exception. The function would be called for each row you'd be inserting.
Let me know if you need an example,
Josh Lindenmuth
Alex Jimenez
I was now able to insert data and ignore the exceptions, but it it so slow... when I insert 5000 rows it takes nearly one minute... how can I do this faster ! I think it takes many ressources by ignoring the mass of exceptions I get... is there a faster way to make many inserts !
Thanks for replies!