加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

359.Somer's D Concordance Statistic

(2016-03-20 12:58:18)
标签:

somer's

d

ordinal

coarse-classifying

分类: 统计分享

Somer's D is to test the association between two ordinal variables. In SAS, proc freq gives this statistic. You can calculate it using either formula or simulation. Here assumed a 2 X 3 contingency table.

 

data have;

input x :$12. good bad;

pg =good/(good+bad);

datalines;

owner 6000 300

renter 1950 540

others 1050 160

;

proc rank data=have out=have;

   var pg;

   ranks xnew;

   run;

data have2;

   set have;

   y=1; w=good; output;

   y=0; w=bad ; output;

   run;

proc freq data=have2;

   tables xnew*y/chisq measures;

   weight w;

   exact SMDRC; *treat R(xnew:attribute) as dependent and C(y:outcome) as independent;

   run;

data formula;

   set have2 end=Eof;

   array gg[3] _temporary_; array bb[3] _temporary_;

   if y=1 then gg[xnew]=w;

   if y=0 then bb[xnew]=w;

   if Eof then do;

      do i =1 to dim(gg);

         do j=1 to i-1;

            t1 ++bb[j]; t2 ++gg[j];

            end;

         s ++(t1*gg[i]-t2*bb[i]);

         call missing(of t1 t2);

         end;

      D =s/(sum(of gg[*])*sum(of bb[*]));

      end;

   run;

data simulation;

   call streaminit(12345);

   set have2 end=Eof;

   array gg[3] _temporary_; array bb[3] _temporary_;

   if y=1 then gg[xnew]=w;

   if y=0 then bb[xnew]=w;

   if Eof then do;

      sg=sum(of gg[*]);

      sb=sum(of bb[*]);

      do i =1 to dim(gg);

         gg[i]=gg[i]/sg; bb[i]=bb[i]/sb;

         end;

      do i =1 to 1e8;

         xb=rand('table', of bb[*]);*pick attribute of bad from the bads;

         xg=rand('table', of gg[*]);*pick attribute of good from the goods;

         bltg ++(xb

         bgtg ++(xb >xg);

         bgeg ++(xb =xg);

         end;

      D=(1*bltg -1*bgtg +0*bgeg)/(i-1);

      end;

   run;

 

In the simulation, the concordance statistic describes the chance that if one picks a good at random from the goods and a bad at random form the bads, the bad's attribute, xb, will be in a lower class than the good's attribute, xg. The higher this probability, the better the ordering of the characteristic's classes reflects the good-bad split in the population.

From the simulation, D = 0.39502673, comparing to an exact value of 0.359.

In coarse classifying the characteristic for scorecard development, a higher value of Somer's D indicates a more definitive split.

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有