I am using the code below but I am getting a "zero" result for dbo.AggredIssue('Test') user defined aggregate everytime that the query executes parallel processing and uses the "Merge" method. It seems that my private variable "private List<string> myList" gets nullified everytime it goes through the "Merge".
I saw other people reporting the same issue in other forums, but nobody was able to provide a solution or explanation.
See below a simplified version of my code (posted just after the queries) that replicates the issue.
The query below works because it does't process the query in parallel.
SELECT
GroupID, dbo.AggregIssue('Test')FROM
MyTablewhere
fund = 2group
by GroupIDThe query below doesn't work because it process the query in parallel.
SELECT
GroupID, dbo.AggregIssue('Test')FROM
MyTablewhere
fund <= 20group
by GroupID
[Serializable]
[SqlUserDefinedAggregate(< xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
Format.UserDefined,
IsInvariantToNulls = true,
IsInvariantToDuplicates = false,
IsInvariantToOrder = true,
MaxByteSize = 1000)]
public class AggregIssue : IBinarySerialize {
private List<string> myList;
private int myResult;
public void Init() {
myList = new List<string>();
}
public void Accumulate(SqlString Value) {
if (Value.IsNull) { return; }
myList.Add(Value.ToString());
}
public void Merge(AggregIssue Other) {
if (Other.myList != null) {
if (myList == null) {
myList = Other.myList;
}
else {
myList.AddRange(Other.myList);
}
}
}
public SqlInt32 Terminate() {
return new SqlInt32(myResult);
}
public void Read(BinaryReader r) {
myResult = r.ReadInt32();
}
//The code below is simplified for posting in the forum.
//I do additional manipulation of the list and require
//the aggregation to be IBinarySerialize.
//But this code replicates the issue also
w.Write(myList.Count);
}
}

Issue with SqlUserDefinedAggregate
mr.flx
Fernando-
Merge is also called in some more complex queries (such as GROUP BY WITH CUBE).
This is an interesting workaround to the 8k limitation for your scenario. I do worry that your workaround leaves your UDAgg prone to internal changes within SQL Server causes serialization to be more frequently performed or eliminated. For this reason, I'd recommend always using an approach where instances created from Read/Write are no different than the original.
I cannot recommend using this approach for these reasons, but limiting DOP to 1 should eliminate Merge() calls for the most part. If you absolutely must use this workaround, I'd recommend safeguarding against your assumptions (Merge never called, serialization takes place once after all Accumulate calls but before Terminate):
Once again, please bear in mind that future SQL Server changes may very well break these assumptions and that I cannot recommend using this approach.
Thanks!
Jason
Hannah82
Hi Adam,
Thank you very much for your response.
The reason that I don't serialize the list itself is because of the MaxByteSize limitation of 8096. My list will easily go beyond that limitation. I test your solution and it works for a small list, but not for long ones.
I noticed that the list is loosing the information in the merge. If the query doesn't go through parallel threads, it works fine.
My assumption is that the serialization will happen before terminate and final output of aggregate value for each row of the result set.
Any other ideas
Thanks!!
Fernando
jumanjiCA
There seems to be a few issues within the thread:
1) Why is the private field myList nullified
Whenever serialization takes place (Read/Write), a new instance is instantiated and it is up to your serialization code to fill the instance. Since your Write() only sets myResult, myList will be remain on the default value of null.
2) Why is MaxByteSize not always enforced
MaxByteSize is enforced during serialization. You could cause instances to become much larger than 8k over Accumulate() calls and then not serialize out 8k.
With this said, there should not be a reason to build UDAggs that become larger than 8k but do not serialize out 8k. Read/write should serialize all necessary information for the UDAggs.
The UDAgg code within this thread is a good example of a contrived case for this. Since Terminate() only returns the count, there is not a need to accumulate the actual strings, but Accumulate() could increment a counter instead.
3) When is Read/Write called
Within SQL Server 2005, serialization takes place during Merge() and before Terminate() is called. If the UDAgg provided accessed the myList within Terminate(), you would see a NullReferenceException from within Terminate() (since Write() does not reconstruct myList and null is its default reference).
As pointed out within this thread, Read/Write is not called for every row. However, please write your UDAgg with proper serialization semantics as if it were called every row.
Hope that helps!
-Jason
al kerr
Jason,
Thanks for all your help and explanations.
Fernando
Parky106
--
Adam Machanic
Pro SQL Server 2005, available now
http://www..apress.com/book/bookDisplay.html bID=457
--
[Serializable]
[SqlUserDefinedAggregate(
Format.UserDefined,
IsInvariantToNulls = true,
IsInvariantToDuplicates = false,
IsInvariantToOrder = true,
MaxByteSize = 1000)]
public class AggregIssue : IBinarySerialize {
private List<string> myList;
private int myResult;
public void Init() {
myList = new List<string>();
}
public void Accumulate(SqlString Value) {
if (Value.IsNull) { return; }
myList.Add(Value.ToString());
}
public void Merge(AggregIssue Other) {
if (Other.myList != null) {
if (myList == null) {
myList = Other.myList;
}
else {
myList.AddRange(Other.myList);
}
}
}
public SqlInt32 Terminate() {
return new SqlInt32(myResult);
}
public void Read(BinaryReader r) {
myResult = r.ReadInt32();
}
//The code below is simplified for posting in the forum.
//I do additional manipulation of the list and require
//the aggregation to be IBinarySerialize.
//But this code replicates the issue also
public void Write(BinaryWriter w) {w.Write(myList.Count);
}
}
Beniton
It seems that you are right and serialization/deserialization happens when merge is called.
For now I have a workaround that was suggested in another forum to use "MAXDOP=1" and it works as merge is never executed.
It is not ideal, as parallelism cannot be leveraged, but it is a workaround.
Thanks to all for the responses!!
Fernando
Mike Cating
Jason,
Thanks for all your explanation. It makes sense to me now how all this work.
But I still have one issue, which is the limitation of 8K. The list that I am building cannot be contrived until it is complete just before "Terminate". In the example I am passing the count and I could do that in the merge also. But in my real case, I need the complete list to be used in the resolution of a non-linear equation, so I cannot do anything with it until I pass it to a iterative algorithm in order to solve the equation. But this list can grow bigger than 8K.
The only way to avoid so far is to limit the degree of parallelism to 1, so the code doesn't go to merge.
Any thoughts on this
Thanks,
Fernando
Steadyonabix
--
Adam Machanic
Pro SQL Server 2005, available now
http://www..apress.com/book/bookDisplay.html bID=457
--
Dan Haywood
I think Adam is right about you didn't serialize the List. Other people on the forum is talking about "break the 8k boundary" stuff, don't know what conclusion they have right now. But since you are using a List to join strings together, you always facing the problem.
About Merge() stuff, to my limited understanding:
If single thread (no parallel op):
Init() -> Accumulate() -> Terminate()
If multi threads (parallel plan):
T1: Init() -> Accumulate()
T2: Init() -> Accumulate() -> Merge( with T1)
T3: Init() -> Accumulate() -> Merge( with T3) -> Terminate()
This is only to illustrate the way Merge() works. Any T can be reused during this, thus why Init() must clean everything.
In Adam's book Chapter 6, p. 195, I quote:
"It is important to understand when dealing with aggregates that the intermediate result will be serialized and deserialized once per row of aggregated data. Therefore, it is imperative for performance that serialization and deserialization be as efficient as possible"
I'm a bit confused here, could Adam give some explanation for this paragraph What's the relationship between Merge() and once per row of aggregated data
Regards,
Dong Xie