Many thanks for your response. I've tried to put together my thoughts on some of your questions.
I was under the impression that SQL express will not be able to perform merge replication with non MS Databases fx: DB2 on AS 400 [Ref: http://forums.microsoft.com/MSDN/ShowPost.aspx PostID=314274&SiteID=1]
>>it is not clear what the disconnected mode entails for your >>application. Here are some of the questions:
>>1. Who owns the data Is there a central server that will >>maintain the data
Avinash: Yes. A central db server (ms sql 2000/20005/DB2 Win/DB2 AS400) exists and holds info from all online users and also all ocassionally connected users. (consolidated data)
>>2. What happens to logical conflicts For example, if 2 users >>enter the same product and you generate two different >>GUIDs for example how will you resolve the conflict since they >>are one and the same. You do not want to end up in a >>situation where your data has lots of duplicates and you have >> clean it periodically in the central copy
Avinash: we've adopted the RDA philosophy. Its is assumed that there will be no conflicts. This is acheived by partitioning the data (by geography, product categorization etc). So, by design there should not be conflicts b/w records that originate from two users. If however there are updates then the rule that we have agreed upon is "the last update wins". What if we have duplicate products uniqueness of a product from a database perspective comes from the unique primary key. However, we were thinking that during the synch operation - some additional business checks may be performed and the users may be notified via a "System Messages" feature that we have, that there there are two products with the same name and that the user needs to resolve the conflict.
>>3. Does the user own the data Maybe you can partition by >>user.
Avinash: Yes. We are partitioning the data. If the system is setup correctly no two users must be working on the same set of data .
>>4. Since the server generates the unique id, you can >>generate local ids it seems like and then resolve the conflicts >>when you insert/update the data on the server
Avinash: I was thinking that the tables/data base objects on the server and the client would be identical. i.e. we only maintain one version of the database {scripts etc} and use the same on both the server and the client (offline database) . We are currently using identity columns on the table and that works just fine in the central server and always connected users. In case of the offline users - if we leave the identity to be present surely there will be conflicts in the ids being generated online and offline. And like I mentioned I would like to avoid the situation of having one setof ids on the client and then replacing them during or post a synch operation. Since there are several tables with FKs this would be quite messy.
This is exactly where I was hoping to be able to generate unique ids and therefore get away from any possible issues. i.e. both the server and client would use some kind of a integer id generation algo in a sp/dynamic sql etc. Essentialy exactly what a GUID would do for us - but an integer instead. And I was hoping that the user's id which is originally generated on the server and is always unique - could some how be used to generate the unique integer ids. A user is either an ocassionally connected user or an always connected user and never both together. The way I'm thinking about it - an ocassionally connected user alwyas works with his local data store and a synch task is expected to synch the data b/w his local store and the central store when a netwrk is available. Whereas an always connected user is always connected to the central store.
Thoughts
>>In any case, it seems like there are technologies that has >>built-in support for dealing with this sort of topology (merge >>replication and peer-to-peer replication in SQL Server 2005 >>for example). You can generate ids in your local >>disconnected store and since there is no multi-user conflict I >>don't see the issue here in using some integer sequence.
>>I suggest that you take a look at some of the technologies I >>mentioned, look at some distributed solutions using >>partitioning and then decide.
Avinash: I would prefer to fall back on a well tested, out of the box solution. But the limited research I have done points to not being able to use any just now.
GUID
MobileDev
Hello Uma Jayachandran,
I think I understand what you are saying in terms of the ramifications GUIDs have on indexes. Assuming, we are not going the GUID route can you please point me to techniques that will help generate unique integer ids in a disconnected mode
Here is some home work that I've done on this front - Our app has a unique user id, an integer that is only generated on the server and so at all times will be unique . Can we some how use this to generate other integer (disconnected mode) ids
I dont want to set up a composite key involving the user id, this will mean quite a bit of reengineering of the existing bits of the app and would like to avoid this as far as possible.
I also understad the id ranges suggestion that you have made - I cant use an explict set up kind of mechanism as the number of users are fairly large. In a worst case scenario (which I will need to factor in) we may have as many as 2000+ users. Setting up ranges for them may be too much trouble. Also managing size of each range.
is there some other math algo that I can use
Here is something I ran into and am wondering if I can do something like this - http://www.c-sharpcorner.com/UploadFile/sgersh01/GenerateintegerIDs12032005000238AM/GenerateintegerIDs.aspx ArticleID=40b9fc09-d5a0-4084-ba31-86a0de3997b8
Another interesting post on the web suggests generating an MD5 checksum.
What I'm hoping to uncover is - Can we not use GUIDs to some how generate unique integers. I sure hope that is some how possibe to do. Or atleast use the user id to generate unique integer ids.
Thanks,
Avinash
RON 7659
I was under the impression that I want to cluster data that goes together. In my example, I want all the Smith's together and all the Jones together. Or, for that matter all the S% 's or J% 's. That is, if thats the way I grab them. When their key is used in a child table, that would be clustered along with the constituent child lookups.
For example, in specfied order the following are entered into the database:
Presidents:
[(4, Madison, James), (3, Jefferson, Thomas), (1, Washington, George), (6, Adams, John Quincy), (5, Monroe, James), (2, Adams, John)]
EventType:
[(intraterm), (postterm), (preterm)
And then I want to track the following entries, entered in this order:
Using default key clustering gives this physical layout ( "*" represents generated Identity key index, (field) represents inclusion in non-clustered index):
But, my users will select by President Name, or Event, Or a combination of both and want it presented in order of occurence. Shouldnt the best performance be one that takes this into account and wouldnt that make less I/O demands
Clustering with this in mind would yield the following layout ( [field] represents inclusion in clustered index):
p.partovi
mickers
SAMI_12
bexter
guid is a great choice.
no a hash won't work. isnt a hash a string
your hash wont be much smaller that stringed guid.
I am sure that generating a guid is quicker than running a hashing algorithm.
what arguments are you getting against a guid
JamesWalshe
scsaba
Hi...
I think GUIDS are a good thing depending on the situation where you want to use them. Most DBAs dont really like them, but i think the reason for this is its hard to remember a Guid or a pain to type one. A GUID itself is only a few byte larger then an integere and if its used in a table it should usually not cause any problems. The one thing that was already mentioned is the picking it as a clustered Index is usually a bad choiche, since there is really nothing in a guid that can be clustered...
The main problem here is the requirement for "other DBs" which would mean that your software needs to be able to convert the guid into a datatype that can be understood by the other Systems. The GUID hold 16 Bytes of Information... Now the "easy" way to encode this would be a string of the hex representation, or you could just split the bytes into a few integers and force the DB to create multiple fields which holds this. Or create a small textfield with a fixed length of 24 Characters and store the () encoded String in it This will allow you to save a "few bytes (8)" from the usual hexadecimal text representation and will avoid 0x00 Bytes in the textstring.
But the question is how many rows do you need to store in your Application Will it be more hassle to provide another aproach GUID are ideal if you have Data that needs to be inserted in a DB "later" since you already can get a valid key and dont need to change an index field later. I would go with GUIDS... Since logical partitioning of indexes like someone else suggested can be a greater pain... Especially once you handed out all possible ranges to clients of your system, and you then need to add another client and you have 0 keys for him "left"
anjumahesh
Unless performance is SERIOUSLY strained on any of the servers/platforms in question, I don't think there should be much debate about it. Yes, some people think that GUIDs are the fruit of the devil, but performance tuning before the fact is the root of all evil - as the saying goes.
I'd be a bit leery about hashing the GUIDs, unless you were pretty confident of the hashing algorithm. Likewise, storing GUIDS as text takes up a pretty decent amound of space (see:
http://blog.angrypets.com/2003/12/warning_guids_a.html for more info)
... but that too isn't soooo much space that I'd be worried about it's performance impact. The only real place you'll see significant GUID 'rot' is when you're using them in joins - and that really shouldn't decrease performance too much - it just serves as a wider index than an int (i.e. 16bytes instead of 4)...
In other words, I'd go with the GUID if it were me - and then deal with any performance issues IF and when they arose (though I doubt you'll encounter anything that couldn't be offset by some decent tuning elsewhere - unless we're talking about tables measurable in hundreds of GBs...)
--Mike
Ben Fidge