Hi, I have been asked to write some code that can check a large table for duplicate values in a non pk column. The table may have up to 1000000 rows. The PK column is an auto increment field. For performance reasons the column in question could not be set to unique values only for inserts, an algorithm is used to create unique no's before the insert but what I am doing is double checking that their have been no duplicates created accidently. If their are duplicates I need to know what rows they occurred on.
Thanks

unique values query
office of technology
One way
select a.id,a.field1,a.field2
from (
select field1,field2
from table1
group by field1,field2
having count(*) > 1) b join table1 a on b.field1 =a.field1
and b.field2 =a.field2
Denis the SQL Menace
http://sqlservercode.blogspot.com/Ramakrishna Neela
Table is called PackNos
PK = PackKey
column 1 =PackNo
column 2 = LabelNo
cheers
a-jifish
To test this I executed two more queries.
1 "Select distinct PackNo from PackNos" 421492 rows
2 "Select PackNo From PackNos" 422831 rows
Distinct weeds out duplicates doesn't it
What am I doing wrong
Juan Manuel110953
Try this
select a.PackKey,a.PackNo,a.LabelNo
from (
select PackNo,LabelNo
from PackNos
group by PackNo,LabelNo
having count(*) > 1) b join PackNos a on b.PackNo =a.PackNo
and b.LabelNo =a.LabelNo
Denis the SQL Menace
http://sqlservercode.blogspot.com/JasonToTheH
But the question you asked was about duplicates in the packNo, labelNo combination.
Change to:
select a.PackKey,a.PackNo,a.LabelNo
from (
select PackNo, max(labelNo) as labelNo --guessing
from PackNos
group by PackNo
having count(*) > 1) b join PackNos a on b.PackNo =a.PackNo
and b.LabelNo =a.LabelNo
If you want to look for duplicates in the packNo column, and want to get one random row based on the lableNo column