Is it a good idea to use Guids as primary key. What is the impact on performance. Guids are random by nature that may have an impact on writing clustered index. Any comments or suggestions
Yes, a GUID is bigger than an int, and it's not really readable, but it is still the best choice as a primary key, if you take following guidelines into consideration: . Do NOT create a clustered index on the GUID primary key . generate the GUID on the middle tier (not via Newid(), this takes CPU on the server) . set the rowguidcol attribute on the column
Why is it better than an Identity Int . It allows to combine records from different databases (e.g. Production and development) . It allows to distribute the database over multiple servers, knowing that you can still combine the data afterwards . You can generate master and detail records on the middle tier, instead of having to roundtrip between the 2 . You will need a GUID col when you use merge replication
in short, an Identity Int may be a microsecond faster, but a Guid is so much more flexible
I prefer to use int or bigint set as identity. In my single db server environments, it works very well.
It is easier to read/remember, plus using it as an index is quicker since even with bigint, the column is half as wide as a guid.
Just be careful of setting it as a clustered index. This could cause hotspots in your data pages.
However, one note on identity. When you use enterprise manager (2000) or Management Studio (2005) the identity seed is set at 1 as a default. By accepting this, you lose half of the available IDs. Always set this to the minimum for the data type you are using. (-2147483648 for int, -9223372036854775808 for bigint)
I'm not a complete expert, but one problem I know of with GUID's with clustering keys is that they are not naturally increasing so this could introduce lots of fragmentation in environments with lots of inserts. They are also wide (16 bytes) so having them in all indexes and foreign keys can be a hit to the size of the database, scan performance, etc.
I think you should use both... On a small table that will only hold lookup values you should consider using an Int as a key, but when the table will hold a lot of data, then you should use a Guid. A GUID is only 16 Bytes, but you ensure that you DB can be spread across multiple servers. Its only a few extra bytes you are going to spend where so that should make no real difference.
I belive a key is there only for the DB. You shouldnt care what it is.
Remember a few years ago, there where a bunch of guys that thought they could "save a few byte" if they would store the YEAR with only a 2 digit value
If you stick to integer key, then you will run into problems once you want to scale out... I mean diskspace isnt reall expensive these days...
GUIDs qualifies all three major requirements of Clustered Key. They are:
1. Narrow 2. Non Updatable 3. Unique
The only major drawback in Primary Key GUIDs with clustered index is that they were not sequential in SQL Server 2000 but in SQL Server 2005 Microsoft has removed that limitation with new function newsequentialid().
For SQL Server 2000 one can use the solution described in following URL:
Actually, (and you're not the only one ni the forum to make this mistake) a GUID in SQL Server is 32 bytes. MS Access uses a 16 byte GUID, but for replication only.
As far as using a GUID for a primary key, I'm working with a database that has 125M+ rows and a GUID defnied as the PK. The performance is horrible. I count this as one of my worse decisions.
If you truly need a globally unique identity, then by all means use a GUID. However, I would not make it the primary key except in the most extreme circumstances. I will be changing our databases to use an auto-increment bigint for the primary key and use the GUID as the record identifier.
NEWSEQUENTIALID() has lot of problems. How about programatically generating Sequential Guids. At least don't have to be used with default constrained and can guaranteed to be unique at system/application scope
They are certainly updatable, moreso than, say, an
IDENTITY.
And they are not any "more" unique than any other
datatype. There is a larger domain of possible values, but there is no
guarantee that you won't generate a duplicate at some point.
GUIDs qualifies all three major requirements of Clustered Key. They
are:
1. Narrow 2. Non Updatable 3. Unique
The only major
drawback in Primary Key GUIDs with clustered index is that they were not
sequential in SQL Server 2000 but in SQL Server 2005 Microsoft has removed
that limitation with new function newsequentialid().
For SQL Server
2000 one can use the solution described in following URL:
Is
it a good idea to use Guids as primary key. What is the impact
on performance. Guids are random by nature that may have an impact on
writing clustered index. Any comments or suggestions
"Consider using the IDENTITY property when global uniqueness is not necessary, or when having a serially incrementing key is desirable."
So it comes down to your application. Is the key needed on more than one server, or is it going to be replicated using transactional replication Use a GUID. If not, use IDENTITY.
Guids as primary keys
jigme
. Do NOT create a clustered index on the GUID primary key
. generate the GUID on the middle tier (not via Newid(), this takes CPU on the server)
. set the rowguidcol attribute on the column
Why is it better than an Identity Int
. It allows to combine records from different databases (e.g. Production and development)
. It allows to distribute the database over multiple servers, knowing that you can still combine the data afterwards
. You can generate master and detail records on the middle tier, instead of having to roundtrip between the 2
. You will need a GUID col when you use merge replication
in short, an Identity Int may be a microsecond faster, but a Guid is so much more flexible
Laura Bagnall
grcor
It is easier to read/remember, plus using it as an index is quicker since even with bigint, the column is half as wide as a guid.
Just be careful of setting it as a clustered index. This could cause hotspots in your data pages.
However, one note on identity. When you use enterprise manager (2000) or Management Studio (2005) the identity seed is set at 1 as a default. By accepting this, you lose half of the available IDs. Always set this to the minimum for the data type you are using. (-2147483648 for int, -9223372036854775808 for bigint)
BobP
drtom
Ken Hwang
I think you should use both... On a small table that will only hold lookup values you should consider using an Int as a key, but when the table will hold a lot of data, then you should use a Guid. A GUID is only 16 Bytes, but you ensure that you DB can be spread across multiple servers. Its only a few extra bytes you are going to spend where so that should make no real difference.
I belive a key is there only for the DB. You shouldnt care what it is.
Remember a few years ago, there where a bunch of guys that thought they could "save a few byte" if they would store the YEAR with only a 2 digit value
If you stick to integer key, then you will run into problems once you want to scale out... I mean diskspace isnt reall expensive these days...
nuclearfusion
GUIDs qualifies all three major requirements of Clustered Key. They are:
1. Narrow
2. Non Updatable
3. Unique
The only major drawback in Primary Key GUIDs with clustered index is that they were not sequential in SQL Server 2000 but in SQL Server 2005 Microsoft has removed that limitation with new function newsequentialid().
For SQL Server 2000 one can use the solution described in following URL:
http://sqldev.net/xp/xpguid.htm
Jitendra Pardasani
Jody Byrd
Actually, (and you're not the only one ni the forum to make this mistake) a GUID in SQL Server is 32 bytes. MS Access uses a 16 byte GUID, but for replication only.
As far as using a GUID for a primary key, I'm working with a database that has 125M+ rows and a GUID defnied as the PK. The performance is horrible. I count this as one of my worse decisions.
If you truly need a globally unique identity, then by all means use a GUID. However, I would not make it the primary key except in the most extreme circumstances. I will be changing our databases to use an auto-increment bigint for the primary key and use the GUID as the record identifier.
Mridu
nuisance
--
Adam Machanic
Pro SQL Server 2005, available now
http://www..apress.com/book/bookDisplay.html bID=457
--
MauiSon
--
Adam Machanic
Pro SQL Server 2005, available now
http://www..apress.com/book/bookDisplay.html bID=457
--
Osi
As an aside, we (meaning my company) never really use IDENTITY. We generate our own surrogate keys.
-Jamie
lars corneliussen
http://msdn2.microsoft.com/en-us/library/ms190215.aspx
The last part of it answers your question.
"Consider using the IDENTITY property when global uniqueness is not necessary, or when having a serially incrementing key is desirable."
So it comes down to your application. Is the key needed on more than one server, or is it going to be replicated using transactional replication Use a GUID.
If not, use IDENTITY.
BobP
Lakshmi.A.V
Inder Virmani