Hi, all here, I found that in my case when I trained the data mining models, the model cover rate is very low (in my case, the train data set has 82 rows but the case occuring in the models I trained is only 25). How can I improve the cover rate to improve the quality of the models (if it is possible in SQL Server 2005) I am using SQL Server 2005.
Cheers.

How can we improve the cover rate of the model?
Bob Lee
Mahesh Hegde
Upul-uj
Hi, Jamie, below is my training data for the mining models building.
.........................................................................................................................................
mark1,mark2,Agreed,marker1,marker2,difference
55,55,55,TC,GR,0
66,69,68,CM,JQ,3
38,37,38,JT,JQ,1
52,57,57,ST,GR,5
64,60,63,JWh,GR,4
54,54,54,BW,JW,0
65,69,67,CM,JQ,4
68,68,68,JWh,MW,0
76,68,72,CM,JWh,8
48,47,48,CL,GR,1
68,62,64,AL,JWh,6
43,40,43,BW,Rdu,3
64,,64,MW,ST,
65,65,65,DG,MB,0
,,,GR,ST,
64,,61,CM,DG,
55,56,56,CM,TC,1
67,61,65,DG,CM,6
54,54,54,JM,CL,0
46,46,46,GR,BS,0
68,72,68,JWh,CM,4
60,60,60,BW,BS,0
65,65,65,DG,GR,0
57,,57,CH,DG,
58,55,58,CS,JT,3
51,51,51,BS,EC,0
,64,65,MW,TC,
66,70,67,DG,RC,4
,,,DG,CM,
66,57,64,JWh,AL,9
77,,74,CM,DG,
54,56,55,MW,TC,2
61,,57,JQ,JT,
61,61,61,DG,CH,0
68,68,68,CH,JM,0
61,66,61,CH,RC,5
68,68,68,BS,CS,0
62,63,63,DG,CH,1
62,63,63,TC,DG,1
63,58,60,JQ,CM,5
76,70,74,JWh,TC,6
68,68,68,CS,BS,0
60,56,58,CM,JWh,4
66,64,65,JW,BW,2
63,,64,TC,DG,
58,,58,RD,DG,
64,72,68,TC,MH,8
72,,72,CH,DG,
66,60,66,GR,RC,6
58,58,58,BSm,GM,0
51,49,50,GR,CM,2
62,,62,BW,CH,
53,50,50,TC,GR,3
,74,74,JWh,MW,
61,60.5,61,JT,CS,0.5
62,63,63,DG,CH,1
58,58,58,BS,JM,0
64,66,66,JT,BS,2
63,60,62,JWh,TC,3
55,,55,MB,DG,
72,,72,BSm,SS,
62,67,64,RD,GM,5
71,67,70,GR,CM,4
62,66,62,CS,CM,4
,68,52,DF,MH,
65,65,65,GM,TC,0
55,56,56,EC,TC,1
66,,66,CH,DG,
70,72,70,JM,CL,2
,,,BW,JT,
57,57,57,DG,CM,0
58,58,58,TC,GM,0
60,60,60,DG,RD,0
54,55,55,TC,CS,1
,64,67,MW,JWh,
60,60,60,DG,GR,0
78,68,72,JWh,MW,10
60,60,60,GR,BS,0
43,48,46,ST,SS,5
57,61,60,SS,ST,4
60,55,60,MW,DF,5
62,62,62,ST,MW,0
...............................................................................................................................
The first column is about the data attributes names.
Thanks a lot for that.
Twil1ght
Hi, Jamie, is it that the model wont cover all the records during its training process
Thanks a lot
ddas
Miechu
Hi, yes. my training data set is quite small which is some university marks data for analysis.
my inputs for the data set are like: mark1, mark 2, marker1, marker2, then the output which i wanna predict is the agreed mark based on mark 1 and mark2 marked by marker1 and marker2 respectively..
In order to make the classification task easily I have discretized the continous mark to be categorical values.
In my data analysis case, the "cover rate" I mean is the cases covered by the mining model. Cos the training data set actually got 82 rows but the mining model just covered 25 cases of it.
Hope this explanation is clear for your help.
Thanks a lot.
lympanda
Hi, Jamie, the number is what all the models covered in the trainings. Like I have 82 rows in the trainning set, for example, clustering model only grouped 25 cases (rows) into clusters.
Thanks a lot.
gsl3
Does you data have a key column The algorithms need a unique key column to determine that each row is a case. The "Agreed" column has 25 distinct values, so it looks like that is what you are using for your key.
I think you need to add an additional "ID" column to uniquely identify each row.
eaho
Hi, Jamie, thanks a lot. Got it done as your suggestion.
Yes, the problem is that the column I used as the key column is only with 25 distinct values which resulted in only 25 cases were covered by the training model.
Thanks a lot.
LionelG